<a href="https://colab.research.google.com/github/Lokeshpatnana/Pandas/blob/main/Pandas_Modifying_DataFrames_and_Series_Part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np

# Downloading and Loading Datasets
Downloading all the required csv files and loading the data into the dataframes

In [None]:
# eCommerce Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/shopping_data_v2.csv

shopping_df = pd.read_csv('shopping_data_v2.csv')

In [None]:
# Covid Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/italy-covid-daywise.csv

covid_df = pd.read_csv('italy-covid-daywise.csv')

In [None]:
# Stackoverflow Survey Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/survey_results_public.csv

survey_df = pd.read_csv('survey_results_public.csv')

In [None]:
# Film Dataset
!wget https://nkb-backend-otg-media-static.s3.ap-south-1.amazonaws.com/otg_prod/media/Tech_4.0/AI_ML/Datasets/film.csv

films_df = pd.read_csv('film.csv')

In [None]:
people = {
    "first": ["Kristen", 'Maxine', 'John'],
    "last": ["Carol", 'Willians', 'Smith'],
    "email": ["KristenC@gmail.com", 'Maxine.Williams@email.com', 'JohnSmith@email.com']
}

people_df = pd.DataFrame(people)

# Add/Delete Columns

**We can create a new column by assigning a `Series` as shown below**

In [None]:
shopping_df

In [None]:
pd.Series(['Amazon', 'Flipkart', 'Walmart'])

In [None]:
shopping_df['Store'] = pd.Series(['Amazon', 'Flipkart', 'Walmart'])
shopping_df

**Creating a new `Series` and adding it to a `DataFrame`**

In [None]:
people_df

In [None]:
(people_df['first'] + ' ' + people_df['last'])

In [None]:
people_df['full_name'] = (people_df['first'] + ' ' + people_df['last'])
people_df

### Splitting columns


In [None]:
people = {
    "full_name": ["Jack Smith", 'Jane Lodge', 'John Doe', 'Kristen Carol'],
    "email": ["JackSmith@gmail.com", 'JaneLodge@email.com', 'JohnDoe@email.com', 'KristenC@email.com']
}

people_df = pd.DataFrame(people)
people_df

In [None]:
people_df['full_name'].str.split(' ')

In [None]:
people_df['full_name'].str.split(' ', expand=True)

In [None]:
people_df[['first', 'last']] = people_df['full_name'].str.split(' ', expand=True)
people_df

### pd.concat
* `pd.concat(objs, axis=0, ignore_index=False, keys=None)`
  * Concatenates pandas objects along a particular axis
  *   `objs` is a sequence or mapping of Series or DataFrame objects
  *  The `axis` to concatenate along.
  * If `ignore_index` is True, then it does not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1.


In [None]:
people_df

In [None]:
age_and_hobbies = {
    "age": [35, 17, 21, 45],
    "hobbies": ["painting", 'football', 'running', 'fishing']
}

age_and_hobbies_df = pd.DataFrame(age_and_hobbies)
age_and_hobbies_df

In [None]:
pd.concat([people_df, age_and_hobbies_df], axis="columns")

**We can also concatenate 2 `series` objects**

In [None]:
name = np.array(['Alexis', 'Jonathan'])
gender = np.array(['Female', 'Male'])

name_series = pd.Series(name)
gender_series = pd.Series(gender)

In [None]:
user_df = pd.concat([name_series, gender_series], axis=1)
user_df

**You can also create column names using the `keys` option**

In [None]:
user_df = pd.concat([name_series, gender_series], axis=1, keys=['name', 'gender'])
user_df

### pd.DataFrame.drop
* `pd.DataFrame.drop(labels=None, axis=0, index=None, columns=None, inplace=False)`
  * `labels` is the index or column labels to drop.
  *  `axis` specifies the axis to drop the labels from.
  * `index` is an alternative to specifying the axis (labels, axis=0 is equivalent to index=labels)
  * `columns` is an alternative to specifying the axis (labels, axis=1 is equivalent to columns=labels).


In [None]:
shopping_df

In [None]:
shopping_df.drop(columns=['Quantity Ordered', 'Purchase Address', 'Store'], inplace=True)

In [None]:
shopping_df

In [None]:
# Alternate way of deleting columns
shopping_df.drop(labels='Order Date', axis='columns')

# Add/Delete Rows

### pd.DataFrame.append
* `pd.DataFrame.append(other, ignore_index=False, sort=False)`
  * Columns in `other` that are not in the dataframe are added as new columns.
  *   If `ignore_index` is True, the resulting axis will be labeled 0, 1, …, n - 1.
  *  If `sort` is True, then it will sort the columns.



In [None]:
people = {
    "first": ["Jack", 'Jane', 'John', 'Kristen'],
    "last": ["Smith", 'Lodge', 'Doe', 'Carol'],
    "email": ["JackSmith@gmail.com", 'JaneLodge@email.com', 'JohnDoe@email.com', 'KristenC@email.com']
}

people_df = pd.DataFrame(people)
people_df

The following line of code throws an error, because we didn't assign an index to the new row.

In [None]:
people_df.append({'first':'Justin', 'last':'Timberlake', 'email': 'Justin.T@gmail.com'})

**If `ignore_index` is set to `True`, the row will be indexed automatically.**

In [None]:
people_df.append({'first':'Justin', 'last':'Timberlake', 'email': 'Justin.T@gmail.com'}, ignore_index=True)

**We can append a dataframe too**

In [None]:
people = {
    "first": ["Katy", 'Thomas'],
    "last" : ["Chap", "Rogers"],
    "email": ["KatyChap@gmail.com", 'ThomasRogers@email.com']
}

people_df_2 = pd.DataFrame(people)
people_df_2

In [None]:
people_df.append(people_df_2, ignore_index=True)

In [None]:
people_df.append(people_df_2, ignore_index=True, sort=True)

## pd.DataFrame.drop

In [None]:
shopping_df

In [None]:
shopping_df.drop(index=[1, 2])

In [None]:
shopping_df.drop(labels=[4, 3], axis="index")

We can use drop with Series' too.

In [None]:
shopping_df['Product'].drop(labels=[1,2])

# Merge DataFrames

**`pd.DataFrame.merge(right, on=None)`**
  * Returns a DataFrame of the two merged objects.
  * `right` is the object to merge with
  * `on`: the columns to join on. It must be in both the dataframe objects.

**Similar to the SQL JOIN operation**

In [None]:
people = {
    "full_name": ["Rama Rao", 'Kuldeep Yadav'],
    "email": ["ramarao@gmail.com", 'kuldeepy@email.com'],
    "place" : ["Hyderabad", "Lucknow"]
}

df1 = pd.DataFrame(people)
df1

In [None]:
location = {
    "place" : ["Hyderabad", "Lucknow"],
    "state" : ["Telangana", "Uttar Pradesh"]
}

df2 = pd.DataFrame(location)
df2

In [None]:
df1.merge(df2, on="place")

In [None]:
pd.DataFrame.merge?

#Sorting


### pd.DataFrame.sort_values

* `pd.DataFrame.sort_values(by, axis=0, ascending=True, inplace=False,  ignore_index=False)`
  * Sort by the values along either axis.
  * `by` is the name or list of names to sort by.
  * `axis`is the axis to be sorted
  * If `ignore_index` is True, then the resulting axis will be labeled 0, 1, …, n - 1.

In [None]:
films_df

In [None]:
films_df.sort_values(by='Year')

In [None]:
films_df.sort_values(by='Popularity', ascending=False)

In [None]:
pd.DataFrame.sort_values?

In [None]:
films_df.sort_values(by='Popularity', ascending=False, ignore_index=True)

In [None]:
films_df.sort_values(by=['Year', 'Popularity'], ascending=[True, False])

We can use `sort_values` with Series' too.

In [None]:
films_df['Length'].sort_values(ascending=False)

### NLargest and NSmallest

**nlargest** returns the first n rows ordered by columns in descending order.

In [None]:
films_df['Length']

In [None]:
films_df['Length'].nlargest(10)

In [None]:
films_df.nlargest(15, ['Popularity', 'Length'])

**nsmallest** returns the first n rows ordered by columns in ascending order.

In [None]:
films_df['Year'].nsmallest(5)

In [None]:
films_df.nsmallest(5, 'Length')

# Try It Yourself

For the following questions, use the **Covid** dataset.

0.  Load the dataset into a dataframe using `read_csv`
1.  Add a row to the dataset, containing stats for September 5th.
2.  Sort the dataset in ascending order of `date` and descending order of `new_cases`.
3. Get the top 15 days with the largest number of new_cases.
4. Drop the `new_deaths` and `new_tests` columns of the dataset.
5. Drop the first 5 rows of the dataset.
