<a href="https://colab.research.google.com/github/Lokeshpatnana/Pandas/blob/main/Pandas_Modifying_DataFrames_and_Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np

# Downloading and Loading Datasets
Downloading all the required csv files and loading the data into the dataframes

In [None]:
# eCommerce Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/shopping_data_v2.csv

shopping_df = pd.read_csv('shopping_data_v2.csv')

In [None]:
# Covid Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/italy-covid-daywise.csv

covid_df = pd.read_csv('italy-covid-daywise.csv')

In [None]:
# Stackoverflow Survey Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/survey_results_public.csv

survey_df = pd.read_csv('survey_results_public.csv')

In [None]:
# Film Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/film.csv

films_df = pd.read_csv('film.csv')

# Modifying Column Names

In [None]:
people = {
    "First Name": ["Kristen", 'Maxine', 'John'],
    "Last Name": ["Carol", 'Willians', 'Smith'],
    "Email ID": ["KristenC@gmail.com", 'Maxine.Williams@email.com', 'JohnSmith@email.com']
}

In [None]:
people_df = pd.DataFrame(people)
people_df

In [None]:
people_df.columns

In [None]:
people_df.columns = ["F Name", "L Name", "Email"]
people_df

In [None]:
people_df.columns = people_df.columns.str.replace(' ', '_')
people_df

In [None]:
people_df.columns = [x.lower() for x in people_df]
people_df

### pd.DataFrame.rename
* `pd.DataFrame.rename(mapper=None, index=None, columns=None, axis=None, inplace=False)`
  *   `mapper` is the dict-like or functions transformations to apply to that axis’ values.
  *   `axis` is the axis to target with mapper.
  *   `index` is an alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper)
  *  `columns` is an alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper)



In [None]:
shopping_df

In [None]:
shopping_df.rename(columns={'Price Each':'Price', 'Product':'Product Name'})

In [None]:
# Alternate way of renaming columns
shopping_df.rename(mapper={'Price Each':'Price', 'Product':'Product Name'}, axis=1)

**After carefully checking the changes, use `inplace=True` to make the changes reflect in the actual dataframe.**

In [None]:
shopping_df.rename(columns={'Price Each':'Price', 'Product':'Product Name'}, inplace=True)

In [None]:
shopping_df

# Modifying Rows

Trying to update a value as shown below will raise an error. Instead, **values should be updated using `loc` or `iloc`.**

In [None]:
filt = (people_df['email'] == 'KristenC@gmail.com')
people_df[filt]['last_name'] = 'Smith'

In [None]:
people_df

### Updating values with `loc`

In [None]:
people_df.loc[1] = ['Will', 'Tatum', 'Will.T@gmail.com']
people_df

In [None]:
people_df.loc[2, ['f_name', 'email']] = ['Johnny', 'JohnnySmith@email.com']
people_df

In [None]:
people_df.loc[[1, 2], ['f_name']] = 'Apple'
people_df

# Methods to Update Data

### pd.Series.apply
* `pd.Series.apply(func, convert_dtype=True, args=(), **kwds)`
  *   Applies the function on the values in the Series.
  *   `func` is the Python function or NumPy ufunc to apply.
  *   `args` are the positional arguments passed to func after the series value.
  *  `**kwds` are the additional keyword arguments passed to func.

In [None]:
def change_product_name(product_name):
  new_name = product_name.lower()
  new_name = new_name.replace(" ", "_")
  return new_name

In [None]:
shopping_df["Product Name"].apply(change_product_name)

**Can also use lambda functions**

In [None]:
shopping_df["Price"].apply(lambda x: x + 5)

In [None]:
shopping_df["Product"].apply(len)

Apply can be used directly with dataframes too. But, the Objects passed to the function are `Series` objects

In [None]:
shopping_df.apply(len)

### pd.DataFrame.applymap
* `pd.DataFrame.applymap(func)`
  *   Applies the function to every element of the DataFrame.
  *   `func` is the Python function to apply.

In [None]:
people = {
    "full_name": ["Jack Smith", 'Jane Lodge', 'John Doe', 'Kristen Carol'],
    "email": ["JackSmith@gmail.com", 'JaneLodge@email.com', 'JohnDoe@email.com', 'KristenC@email.com']
}

In [None]:
people_df = pd.DataFrame(people)
people_df

In [None]:
people_df.applymap(len)

In [None]:
people_df.applymap(str.lower)

### pd.Series.map
* `pd.Series.map(arg)`
  *   It is used to map values of the Series according to input correspondence.
  *   `arg` is used for substituting each value in a Series with another value. It may be a function, a dict or a Series.

**Values in the Series that are NOT in the dictionary are converted to `NaN`.**

In [None]:
shopping_df

In [None]:
shopping_df['Product Name'].map({'Google Phone':'Google Pixel', 'iPhone':'iPhone 6'})

### pd.Series.replace
* `pd.Series.replace(to_replace=None, value=None, inplace=False)`
  *   Applies the function on the values in the Series.
  *   `to_replace` is/are the value(s) to replace. It may be a str, list, dict, Series, int etc.
  *   The `value ` to replace any values matching `to_replace` with.


In [None]:
shopping_df['Product Name'].replace({'Google Phone':'Google Pixel', 'iPhone':'iPhone 6'})

**Replace can be used with dataframes too**


In [None]:
shopping_df.replace(to_replace=["USB-C Charging Cable", "Lightning Charging Cable"], value ="Charging Cable")

# Add/Delete Columns

**We can create a new column by assigning a value as shown below**

In [None]:
shopping_df['Store'] = pd.Series('Amazon')
shopping_df

In [None]:
people_df['full_name'] = people_df['first'] + ' ' + people_df['last']
people_df

### Splitting columns


In [None]:
people = {
    "full_name": ["Jack Smith", 'Jane Lodge', 'John Doe', 'Kristen Carol'],
    "email": ["JackSmith@gmail.com", 'JaneLodge@email.com', 'JohnDoe@email.com', 'KristenC@email.com']
}

In [None]:
people_df = pd.DataFrame(people)
people_df

In [None]:
people_df['full_name'].str.split(' ')

In [None]:
people_df['full_name'].str.split(' ', expand=True)

In [None]:
people_df[['first', 'last']] = people_df['full_name'].str.split(' ', expand=True)
people_df

### pd.concat
* `pd.concat(objs, axis=0, ignore_index=False, keys=None)`
  * Concatenates pandas objects along a particular axis
  *   `objs` is a sequence or mapping of Series or DataFrame objects
  *  The `axis` to concatenate along.
  * If `ignore_index` is True, then it does not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1.


In [None]:
people_df

In [None]:
age_and_hobbies = {
    "age": [35, 17, 21, 45],
    "hobbies": ["painting", 'football', 'running', 'fishing']
}
age_and_hobbies_df = pd.DataFrame(age_and_hobbies)

In [None]:
pd.concat([people_df, age_and_hobbies_df], axis=1)

In [None]:
pd.concat([people_df, age_and_hobbies_df], axis=1, ignore_index=True)

In [None]:
name = np.array(['Alexis', 'Jonathan'])
gender = np.array(['Female', 'Male'])
name_series = pd.Series(name)
gender_series = pd.Series(gender)

In [None]:
user_df = pd.concat([name_series, gender_series], axis=1)
user_df

**You can label the index keys you create with the names option.**

In [None]:
user_df = pd.concat([name_series, gender_series], axis=1, keys=['name', 'gender'])
user_df

### pd.DataFrame.drop
* `pd.DataFrame.drop(labels=None, axis=0, index=None, columns=None, inplace=False)`
  * `labels` is the index or column labels to drop.
  *  `axis` specifies the axis to drop the labels from.
  * `index` is an alternative to specifying the axis (labels, axis=0 is equivalent to index=labels)
  * `columns` is an alternative to specifying the axis (labels, axis=1 is equivalent to columns=labels).


In [None]:
shopping_df.drop(columns=['Quantity Ordered', 'Purchase Address'])

Axis 1 can be referred to as 'columns'

In [None]:
# Alternate way of deleting columns
shopping_df.drop(labels='Order Date', axis='columns')

# Try It Yourself





For the following questions, use the **Stack Overflow** dataset.

**Note:** `NaN` values in the dataset just mean that the user did not respond to that in the survey.

0.   Load the dataset into a dataframe using `read_csv`.
1.   Rename the `ConvertedComp` label to `Salary` and the `MainBranch` label to `Stream`.
2.   Update the third row to have the same values for `Age` and `CompFreq` as the ninth row.
3.   Replace  the country `Russian Federation` with `Russia` throughout the dataset.
4.   Update the country of the first respondent to USA.
5.   Convert the gender to uppercase for every row.

Incase you want to access the schema of the dataset:

In [None]:
# Stackoverflow Schema Survey Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/survey_results_schema.csv

schema_survey_df = pd.read_csv('survey_results_schema.csv')