In [None]:
import pandas as pd

In [None]:
pd

# 1. Create

### 1.1 Create from a CSV

In [None]:
df = pd.read_csv('telco_churn.csv')

### 1.2 Create from a Dictionary

In [None]:
tempdict = {'col1':[1,2,3], 'col2':[4,5,6], 'col3':[7,8,9]}

 This dictionary has three key-value pairs, where each key represents a column name, and the corresponding value is a list of values for that column. 

In [None]:
dictdf = pd.DataFrame.from_dict(tempdict)

In [None]:
print(dictdf)

# 2. Read

### 2.1 Show Top 5 and Bottom 5 Rows

In [None]:
df.head(10)

In [None]:
dictdf.head()

In [None]:
df.tail(15)

### 2.2 Show Columns and Data Type

In [None]:
df.columns

In [None]:
df.dtypes

### 2.3 Summary Statistics

In [None]:
df.describe()

The code df.describe(include='object') provides a summary of descriptive statistics for columns containing object (string) data types in the DataFrame df. It includes count, unique values, top value, and frequency of the most common element in each object-type column.

In [None]:
df.describe(include='object')

### 2.4 Filtering Columns

In [None]:
df.State

In [None]:
df['International plan']

In [None]:
df[['State', 'International plan']]

The code df.Churn.unique() retrieves the unique values present in the 'Churn' column of the DataFrame 'df,' providing a list of distinct categories or values in that specific column.

In [None]:
df.Churn.unique()

### 2.5 Filtering on Rows

In [None]:
df.head()

The code df[df['International plan']=='No'] filters the DataFrame df to select rows where the 'International plan' column has the value 'No', effectively creating a new DataFrame with only those rows.

In [None]:
df[df['International plan']=='No']

The code filters a Pandas DataFrame 'df' to select rows where 'International plan' is 'No' and 'Churn' is 'True', effectively finding customers who do not have an international plan and have churned (cancelled their subscription).

In [None]:
df[(df['International plan']=='No') & (df['Churn']==True)]

### 2.6 Indexing with iloc

The code df.iloc[14] retrieves the 15th row (index 14) of a Pandas DataFrame df, providing access to all data in that specific row as a Pandas Series object.

In [None]:
df.iloc[14]

The code df.iloc[14, -1] accesses a specific cell in a Pandas DataFrame named df. It retrieves the value located at the 15th row (index 14) and the last column (index -1) of the DataFrame.

In [None]:
df.iloc[14,-1]

The code df.iloc[22:33] actually extracts rows from index 22 up to, but not including, index 33. So, it includes rows with index 22 through 32 in the resulting DataFrame, which is 11 rows in total.

In [None]:
df.iloc[22:33]

### 2.7 Indexing with loc

The code creates a new DataFrame called state by making a copy of the original DataFrame df, and then it sets the "State" column as the index of the state DataFrame, replacing the default numeric index.

In [None]:
state = df.copy()
state.set_index('State', inplace=True)

In [None]:
state.head()

The code state.loc['OH'] retrieves a specific row in a Pandas DataFrame named state where the index label is 'OH

In [None]:
state.loc['OH']

# 3. Update

### 3.1 Dropping Rows

The code df.isnull().sum() calculates the number of missing (null) values in each column of the DataFrame df and returns a summary of the count of null values for each column.

In [None]:
df.isnull().sum()

The code df.dropna(inplace=True) removes rows with missing (NaN) values from the DataFrame df and modifies it in place (changes are applied to the DataFrame without creating a new copy).

In [None]:
df.dropna(inplace=True)

The code df.isnull().sum() calculates and returns the count of missing (NaN or null) values for each column in the DataFrame df. It provides a summary of how many missing values are present in each column of the DataFrame.

In [None]:
df.isnull().sum()

### 3.2 Dropping Columns

The code df.drop('Area code', axis=1) removes the column labeled 'Area code' from the DataFrame df along the horizontal axis (axis=1), effectively deleting that column from the DataFrame.

In [None]:
df.drop('Area code', axis=1)

### 3.3 Creating Calculated Columns

The code adds a new column named 'New Column' to the DataFrame 'df', which contains the sum of values from the 'Total night minutes' column and the 'Total intl minutes' column for each row in the DataFrame.

In [None]:
df['New Column'] = df['Total night minutes'] + df['Total intl minutes']

In [None]:
df.head()

### 3.4 Updating an Entire Column


The code adds a new column named "New Column" to the Pandas DataFrame df and assigns the value 100 to every row in that new column.

In [None]:
df['New Column'] = 100

In [None]:
df.head()

### 3.5 Updating a Single Value

The code df.iloc[0, -1] = 10 sets the value at the first row and last column of a Pandas DataFrame df to the value 10. It uses integer-based indexing with iloc, where the first argument refers to the row index (0 in this case), and the second argument, -1, represents the last column.

In [None]:
df.iloc[0,-1] = 10

In [None]:
df.head()

### 3.6 Condition based Updating using Apply

The code adds a new column named 'Churn Binary' to the DataFrame 'df' where it assigns the value 1 if the 'Churn' column is True and 0 if it's False, effectively converting a boolean column into binary (1 for True, 0 for False) for data analysis  purposes.

In [None]:
df['Churn Binary'] = df['Churn'].apply(lambda x: 1 if x==True else 0)

The code you provided is using boolean indexing to filter rows in a DataFrame where the 'Churn' column has the value True, and then it is using the head() function to display the first few rows of the filtered DataFrame. 

In [None]:
df[df['Churn']==True].head()

# 4. Delete/Output

### 4.1 Output to CSV

In [None]:
df.to_csv('output.csv')

### 4.2 Output to JSON

In [None]:
df.to_json()

### 4.3 Output to HTML

In [None]:
df.to_html()

### 4.4 Delete a DataFrame

In [None]:
del df

Activity to try out.
Provided with bank_marketing_dataset.csv dataset.use it to peform the following functions;
1)generate summary statistics for the numerical columns in the dataset.
2)return the first 20 rows of the dataframe and the last 10 elements of the data frame
3)create a new column
