# Lec 10 : Pandas Tutorial for Handling Missing Data (Replace and interpolate)

![image.png](attachment:e2f4fe34-5bf0-4b46-869d-aa5b93e2265c.png)

It looks like you are working with the `replace` method in pandas to replace values in a DataFrame. However, I noticed a couple of things:

1. In the code snippet, the `replace` method is called, but the DataFrame is not assigned to a variable, and the changes are not stored.

2. The `replace` method with the `to_replace` parameter is not used in the correct way. The correct usage would be to assign the result back to the DataFrame or another variable.

Here's a modified version of your code with corrections:

```python
import pandas as pd

# Reading the CSV file into a DataFrame
df = pd.read_csv("movies.csv")

# Displaying the original DataFrame
print("Original DataFrame:")
print(df)

# Replace specific value "PK" with "Praveen Kumar Singh"
df.replace(to_replace="PK", value="Praveen Kumar Singh", inplace=True)

# Displaying the DataFrame after the first replacement
print("\nDataFrame after replacing 'PK' with 'Praveen Kumar Singh':")
print(df)

# Replace all uppercase letters with "Python"
df.replace("[A-Z]", "Python", regex=True, inplace=True)

# Displaying the DataFrame after replacing uppercase letters with "Python"
print("\nDataFrame after replacing uppercase letters with 'Python':")
print(df)

# Replace all lowercase letters with "Python"
df.replace("[a-z]", "Python", regex=True, inplace=True)

# Displaying the DataFrame after replacing lowercase letters with 'Python'
print("\nDataFrame after replacing lowercase letters with 'Python':")
print(df)

# Replace values in the "Movie Title" column matching uppercase letters with 143
df['Movie Title'].replace('[A-Z]', 143, regex=True, inplace=True)

# Displaying the DataFrame after the last replacement
print("\nDataFrame after replacing values in 'Movie Title' column with 143:")
print(df)
```

This corrected code includes appropriate assignments and prints to demonstrate the changes at each step.

In [1]:
import pandas as pd
df = pd.read_csv("finnew.csv")
df

Unnamed: 0,Movie ID,Budget,Box Office Collection,Unit,Currency
0,,,,,
1,102.0,200.0,954.8,Millions,USD
2,103.0,,644.8,Millions,USD
3,104.0,180.0,854.0,Millions,USD
4,105.0,250.0,670.0,Millions,USD
5,406.0,30.0,350.0,Millions,INR
6,107.0,400.0,2000.0,Millions,INR
7,108.0,550.0,4000.0,Millions,INR
8,109.0,390.0,1360.0,Millions,INR
9,110.0,1.4,3.5,Billions,INR


![image.png](attachment:9a7fb7ae-0cd4-4665-81fe-d7fe2dd99e8a.png)

It appears you are using the `interpolate` method in pandas to fill missing values in a DataFrame. However, there are a couple of things to note:

1. In your code, the DataFrame is not displayed or assigned to a variable after each operation.

2. The `inplace=True` parameter is used, which modifies the DataFrame in place. This means that the original DataFrame is changed, and the changes are not displayed.

Here's a modified version of your code with appropriate assignments and prints:

```python
import pandas as pd

# Reading the CSV file into a DataFrame
df = pd.read_csv("your_csv_file.csv")

# Displaying the original DataFrame
print("Original DataFrame:")
print(df)

# Interpolate missing values linearly along the columns (axis=0)
df_interpolated_linear = df.interpolate(method="linear", axis=0)

# Displaying the DataFrame after linear interpolation
print("\nDataFrame after linear interpolation along columns:")
print(df_interpolated_linear)

# Interpolate missing values with a limit of 2
df_interpolated_limit = df.interpolate(limit=2)

# Displaying the DataFrame after interpolation with a limit
print("\nDataFrame after interpolation with a limit of 2:")
print(df_interpolated_limit)

# Interpolate missing values forward with a limit of 2
df_interpolated_forward = df.interpolate(limit_direction="forward", limit=2)

# Displaying the DataFrame after forward interpolation with a limit
print("\nDataFrame after forward interpolation with a limit of 2:")
print(df_interpolated_forward)

# Interpolate missing values backward with a limit of 2
df_interpolated_backward = df.interpolate(limit_direction="backward", limit=2)

# Displaying the DataFrame after backward interpolation with a limit
print("\nDataFrame after backward interpolation with a limit of 2:")
print(df_interpolated_backward)

# Interpolate missing values in both directions with a limit of 1
df_interpolated_both = df.interpolate(limit_direction="both", limit=1)

# Displaying the DataFrame after bidirectional interpolation with a limit
print("\nDataFrame after bidirectional interpolation with a limit of 1:")
print(df_interpolated_both)

# Interpolate missing values using linear interpolation inside the limit area
df_interpolated_limit_area_inside = df.interpolate(limit_area="inside")

# Displaying the DataFrame after interpolation inside the limit area
print("\nDataFrame after interpolation inside the limit area:")
print(df_interpolated_limit_area_inside)

# Interpolate missing values using linear interpolation outside the limit area
df_interpolated_limit_area_outside = df.interpolate(limit_area="outside")

# Displaying the DataFrame after interpolation outside the limit area
print("\nDataFrame after interpolation outside the limit area:")
print(df_interpolated_limit_area_outside)

# Inplace interpolation with a limit in both directions
df.interpolate(limit_direction="both", limit=2, inplace=True)

# Displaying the DataFrame after inplace interpolation
print("\nDataFrame after inplace interpolation:")
print(df)
```

This corrected code includes appropriate assignments and prints to demonstrate the changes at each step. Adjust the file name ("your_csv_file.csv") accordingly based on your actual CSV file.

# Lec 11 : How to Merge and Concat DataFrames

![image.png](attachment:546ea86e-4a4e-4401-938d-afe8f7a97b00.png)

# Merge

In [2]:
import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "B": [11, 12, 13, 14]
})

df2 = pd.DataFrame({
    "A": [1, 2, 3, 5],
    "C": [31, 32, 33, 34]
})

# Displaying the original DataFrames
print("DataFrame 1:")
print(df1)

print("\nDataFrame 2:")
print(df2)

# Inner Join - Rows with common values in column 'A'
inner_join = pd.merge(df1, df2, on="A", how="inner", indicator=True)
print("\nInner Join:")
print(inner_join)

# Outer Join - All rows from both DataFrames, NaN for missing values
outer_join = pd.merge(df1, df2, on="A", how="outer", indicator=True)
print("\nOuter Join:")
print(outer_join)

# Left Join - All rows from the left DataFrame, NaN for missing values on the right
left_join = pd.merge(df1, df2, on="A", how="left", indicator=True)
print("\nLeft Join:")
print(left_join)

# Right Join - All rows from the right DataFrame, NaN for missing values on the left
right_join = pd.merge(df1, df2, on="A", how="right", indicator=True)
print("\nRight Join:")
print(right_join)

# Merging based on indices with suffixes
merged_df = pd.merge(df1, df2, left_index=True, right_index=True, suffixes=("_Name", "_Id"))
print("\nMerged DataFrame:")
print(merged_df)


DataFrame 1:
   A   B
0  1  11
1  2  12
2  3  13
3  4  14

DataFrame 2:
   A   C
0  1  31
1  2  32
2  3  33
3  5  34

Inner Join:
   A   B   C _merge
0  1  11  31   both
1  2  12  32   both
2  3  13  33   both

Outer Join:
   A     B     C      _merge
0  1  11.0  31.0        both
1  2  12.0  32.0        both
2  3  13.0  33.0        both
3  4  14.0   NaN   left_only
4  5   NaN  34.0  right_only

Left Join:
   A   B     C     _merge
0  1  11  31.0       both
1  2  12  32.0       both
2  3  13  33.0       both
3  4  14   NaN  left_only

Right Join:
   A     B   C      _merge
0  1  11.0  31        both
1  2  12.0  32        both
2  3  13.0  33        both
3  5   NaN  34  right_only

Merged DataFrame:
   A_Name   B  A_Id   C
0       1  11     1  31
1       2  12     2  32
2       3  13     3  33
3       4  14     5  34


In [3]:
# Reading movies data from CSV into DataFrame with custom column names and setting "Movie_ID" as the index
df_movies = pd.read_csv("movies.csv", names=["Movie_ID", "Title", "Industry", "Release_Year", "IMDB_Ratings", "Studio", "Language_ID"], index_col="Movie_ID")

# Reading finances data from CSV into DataFrame with custom column names and setting "Movie_ID" as the index
df_finances = pd.read_csv("fin.csv", names=["Movie_ID", "Budget", "Revenue", "Unit", "Currency"], index_col="Movie_ID")


In [4]:
df_movies.head()

Unnamed: 0_level_0,Title,Industry,Release_Year,IMDB_Ratings,Studio,Language_ID
Movie_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Movie ID,Movie Title,Industry,Release \nYear,IMDb \nRating,production_studio,Language Id
101,K.G.F: Chapter 2,Bollywood,2022,8.4,Hombale Films,3
102,Doctor Strange in the Multiverse of Madness,Hollywood,2022,7,Marvel Studios,5
103,Thor: The Dark World,Hollywood,2013,6.8,Marvel Studios,5
104,Thor: Ragnarok,Hollywood,2017,7.9,Marvel Studios,5


In [5]:
df_finances.head()

Unnamed: 0_level_0,Budget,Revenue,Unit,Currency
Movie_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Movie ID,Budget,Box Office Collection,Unit,Currency
101,1,12.5,Billions,INR
102,200,954.8,Millions,USD
103,165,644.8,Millions,USD
104,180,854,Millions,USD


In [6]:
import pandas as pd

# Reading the movies data
df_movies = pd.read_csv("movies.csv", names=["Movie_ID", "Title", "Industry", "Release_Year", "IMDB_Ratings", "Studio", "Language_ID"], index_col="Movie_ID")

# Reading the finances data
df_finances = pd.read_csv("fin.csv", names=["Movie_ID", "Budget", "Revenue", "Unit", "Currency"], index_col="Movie_ID")

# Performing inner join and adding an indicator column
movies_fin_inner = pd.merge(df_movies, df_finances, how="inner", on="Movie_ID", indicator=True)
print("Inner Join:")
print(movies_fin_inner)

# Performing outer join and adding an indicator column
movies_fin_outer = pd.merge(df_movies, df_finances, how="outer", on="Movie_ID", indicator=True)
print("\nOuter Join:")
print(movies_fin_outer)

# Performing left join and adding an indicator column
movies_fin_left = pd.merge(df_movies, df_finances, how="left", on="Movie_ID", indicator=True)
print("\nLeft Join:")
print(movies_fin_left)

# Performing right join and adding an indicator column
movies_fin_right = pd.merge(df_movies, df_finances, how="right", on="Movie_ID", indicator=True)
print("\nRight Join:")
print(movies_fin_right)

# Performing full (outer) join and adding an indicator column
movies_fin_full = pd.merge(df_movies, df_finances, how="outer", on="Movie_ID", indicator=True)
print("\nFull (Outer) Join:")
print(movies_fin_full)

# Performing cross join and adding an indicator column (Note: Cross join is not directly supported in pandas)
movies_fin_cross = df_movies.assign(key=1).merge(df_finances.assign(key=1), on="key").drop("key", axis=1)
movies_fin_cross["_merge"] = "both"
print("\nCross Join:")
print(movies_fin_cross)


Inner Join:
                                                Title   Industry  \
Movie_ID                                                           
Movie ID                                  Movie Title   Industry   
101                                  K.G.F: Chapter 2  Bollywood   
102       Doctor Strange in the Multiverse of Madness  Hollywood   
103                             Thor: The Dark World   Hollywood   
104                                   Thor: Ragnarok   Hollywood   
105                           Thor: Love and Thunder   Hollywood   
107                       Dilwale Dulhania Le Jayenge  Bollywood   
108                                          3 Idiots  Bollywood   
109                          Kabhi Khushi Kabhie Gham  Bollywood   
110                                  Bajirao Mastani   Bollywood   
111                          The Shawshank Redemption  Hollywood   
113                                      Interstellar  Hollywood   
114                                 

![image.png](attachment:97709c6e-632d-4100-b8ec-01922ababd6c.png)
# Concat

In [7]:
import pandas as pd

# Concatenating two Series vertically
ser1 = pd.Series([1, 2, 3, 4])
ser2 = pd.Series([11, 12, 13, 14])
concatenated_series = pd.concat([ser1, ser2])
print("Concatenated Series:")
print(concatenated_series)

# Creating two DataFrames
df1 = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "B": [11, 12, 13, 14]
})

df2 = pd.DataFrame({
    "A": [1, 2, 3, 5],
    "B": [31, 32, 33, 34]
})

# Concatenating DataFrames vertically
concatenated_df_vertical = pd.concat([df1, df2], axis=0)
print("\nConcatenated DataFrame Vertically:")
print(concatenated_df_vertical)

# Concatenating DataFrames horizontally
concatenated_df_horizontal = pd.concat([df1, df2], axis=1)
print("\nConcatenated DataFrame Horizontally:")
print(concatenated_df_horizontal)

# Creating two DataFrames with different columns
df1 = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "B": [11, 12, 13, 14]
})

df2 = pd.DataFrame({
    "A": [1, 2, 3, 5],
    "C": [31, 32, 33, 34]
})

# Concatenating DataFrames vertically with different columns
concatenated_df_diff_columns_vertical = pd.concat([df1, df2], axis=0)
print("\nConcatenated DataFrame Vertically with Different Columns:")
print(concatenated_df_diff_columns_vertical)

# Concatenating DataFrames horizontally with different columns
concatenated_df_diff_columns_horizontal = pd.concat([df1, df2], axis=1)
print("\nConcatenated DataFrame Horizontally with Different Columns:")
print(concatenated_df_diff_columns_horizontal)

# Creating two DataFrames with different rows
df1 = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "B": [11, 12, 13, 14]
})

df2 = pd.DataFrame({
    "A": [1, 2],
    "C": [31, 32]
})

# Concatenating DataFrames vertically with different rows
concatenated_df_diff_rows_vertical = pd.concat([df1, df2], axis=0)
print("\nConcatenated DataFrame Vertically with Different Rows:")
print(concatenated_df_diff_rows_vertical)

# Concatenating DataFrames horizontally with different rows
concatenated_df_diff_rows_horizontal = pd.concat([df1, df2], axis=1)
print("\nConcatenated DataFrame Horizontally with Different Rows:")
print(concatenated_df_diff_rows_horizontal)

# Creating two DataFrames with overlapping columns
df1 = pd.DataFrame({
    "D": [1, 2, 3, 4]
})

df2 = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "C": [31, 32, 33, 34]
})

# Concatenating DataFrames vertically with overlapping columns
concatenated_df_overlap_columns_vertical = pd.concat([df1, df2], axis=0, keys=["df1", "df2"])
print("\nConcatenated DataFrame Vertically with Overlapping Columns:")
print(concatenated_df_overlap_columns_vertical)

# Concatenating DataFrames horizontally with overlapping columns
concatenated_df_overlap_columns_horizontal = pd.concat([df1, df2], axis=1, keys=["df1", "df2"])
print("\nConcatenated DataFrame Horizontally with Overlapping Columns:")
print(concatenated_df_overlap_columns_horizontal)


Concatenated Series:
0     1
1     2
2     3
3     4
0    11
1    12
2    13
3    14
dtype: int64

Concatenated DataFrame Vertically:
   A   B
0  1  11
1  2  12
2  3  13
3  4  14
0  1  31
1  2  32
2  3  33
3  5  34

Concatenated DataFrame Horizontally:
   A   B  A   B
0  1  11  1  31
1  2  12  2  32
2  3  13  3  33
3  4  14  5  34

Concatenated DataFrame Vertically with Different Columns:
   A     B     C
0  1  11.0   NaN
1  2  12.0   NaN
2  3  13.0   NaN
3  4  14.0   NaN
0  1   NaN  31.0
1  2   NaN  32.0
2  3   NaN  33.0
3  5   NaN  34.0

Concatenated DataFrame Horizontally with Different Columns:
   A   B  A   C
0  1  11  1  31
1  2  12  2  32
2  3  13  3  33
3  4  14  5  34

Concatenated DataFrame Vertically with Different Rows:
   A     B     C
0  1  11.0   NaN
1  2  12.0   NaN
2  3  13.0   NaN
3  4  14.0   NaN
0  1   NaN  31.0
1  2   NaN  32.0

Concatenated DataFrame Horizontally with Different Rows:
   A   B    A     C
0  1  11  1.0  31.0
1  2  12  2.0  32.0
2  3  13  NaN   NaN
3

# Lec 12 : Pandas GroupBy - Guide to Grouping Data
![image.png](attachment:6083eeba-62ac-4e73-b8c5-f07f6dcae286.png)

In [8]:
import pandas as pd

# Creating a DataFrame
df_random = pd.DataFrame({
    "Name": ["A", "B", "C", "D", "B", "C", "D", "A", "B", "D"],
    "Score": [34, 23, 12, 12, 23, 21, 23, 34, 12, 12]
})

# Displaying the original DataFrame
print("Original DataFrame:")
print(df_random)

# Grouping the DataFrame by the "Name" column
group_df_random = df_random.groupby("Name")

# Iterating over the groups and printing each group
for name, group in group_df_random:
    print(f"Group Name: {name}")
    print(group)
    print()

# Accessing a specific group (e.g., group with "Name" = "D")
group_d = group_df_random.get_group("D")
print("Group with Name 'D':")
print(group_d)

# Applying aggregation functions to the grouped data
min_scores = group_df_random.min()
max_scores = group_df_random.max()
mean_scores = group_df_random.mean()

# Displaying aggregated results
print("\nMinimum Scores:")
print(min_scores)
print("\nMaximum Scores:")
print(max_scores)
print("\nMean Scores:")
print(mean_scores)

# Describing the grouped data
describe_grouped = group_df_random.describe()
print("\nDescribe Grouped Data:")
print(describe_grouped)

# Converting the groupby object to a list for further inspection
list_group_df_random = list(group_df_random)
print("\nList of Grouped DataFrames:")
print(list_group_df_random)




Original DataFrame:
  Name  Score
0    A     34
1    B     23
2    C     12
3    D     12
4    B     23
5    C     21
6    D     23
7    A     34
8    B     12
9    D     12
Group Name: A
  Name  Score
0    A     34
7    A     34

Group Name: B
  Name  Score
1    B     23
4    B     23
8    B     12

Group Name: C
  Name  Score
2    C     12
5    C     21

Group Name: D
  Name  Score
3    D     12
6    D     23
9    D     12

Group with Name 'D':
  Name  Score
3    D     12
6    D     23
9    D     12

Minimum Scores:
      Score
Name       
A        34
B        12
C        12
D        12

Maximum Scores:
      Score
Name       
A        34
B        23
C        21
D        23

Mean Scores:
          Score
Name           
A     34.000000
B     19.333333
C     16.500000
D     15.666667

Describe Grouped Data:
     Score                                                     
     count       mean       std   min    25%   50%    75%   max
Name                                                 

# Lec 13 : How to Join and Append DataFrames 
![image.png](attachment:0232e991-2733-490f-b19a-dc17f2186f8e.png)

# Join

Certainly! Here's an example of how you can create sample DataFrames and perform a join operation using Python and the Pandas library:

```python
import pandas as pd

# Creating the first DataFrame
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
})

# Creating the second DataFrame
df2 = pd.DataFrame({
    'ID': [2, 3, 4, 5],
    'Age': [25, 30, 35, 40]
})

# Performing an inner join on the 'ID' column
result_inner = pd.merge(df1, df2, on='ID', how='inner')

# Performing a left join on the 'ID' column
result_left = pd.merge(df1, df2, on='ID', how='left')

# Performing a right join on the 'ID' column
result_right = pd.merge(df1, df2, on='ID', how='right')

# Performing an outer join on the 'ID' column
result_outer = pd.merge(df1, df2, on='ID', how='outer')

# Displaying the results
print("Inner Join:")
print(result_inner)

print("\nLeft Join:")
print(result_left)

print("\nRight Join:")
print(result_right)

print("\nOuter Join:")
print(result_outer)
```

In this example, we have two DataFrames (`df1` and `df2`) with a common column 'ID'. We perform inner, left, right, and outer joins using the `pd.merge` function from Pandas. You can customize the join type and columns based on your specific use case.

In [9]:
import pandas as pd

# Creating the first DataFrame
df1 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['Alice', 'Bob', 'Charlie', 'David']
})

# Creating the second DataFrame
df2 = pd.DataFrame({
    'A': [2, 3, 4, 5, 6, 7],
    'B': ['Bob', 'Charlie', 'David', 'Praveen', 'Rahul', 'Priyanka']
})

# Left Join
left_join_result = df1.join(df2, how="left", lsuffix="_df1", rsuffix="_df2")
print("Left Join:")
print(left_join_result)

# Right Join
right_join_result = df1.join(df2, how="right", lsuffix="_df1", rsuffix="_df2")
print("\nRight Join:")
print(right_join_result)

# Inner Join
inner_join_result = df1.join(df2, how="inner", lsuffix="_df1", rsuffix="_df2")
print("\nInner Join:")
print(inner_join_result)

# Outer Join
outer_join_result = df1.join(df2, how="outer", lsuffix="_df1", rsuffix="_df2")
print("\nOuter Join:")
print(outer_join_result)



Left Join:
   A_df1    B_df1  A_df2    B_df2
0      1    Alice      2      Bob
1      2      Bob      3  Charlie
2      3  Charlie      4    David
3      4    David      5  Praveen

Right Join:
   A_df1    B_df1  A_df2     B_df2
0    1.0    Alice      2       Bob
1    2.0      Bob      3   Charlie
2    3.0  Charlie      4     David
3    4.0    David      5   Praveen
4    NaN      NaN      6     Rahul
5    NaN      NaN      7  Priyanka

Inner Join:
   A_df1    B_df1  A_df2    B_df2
0      1    Alice      2      Bob
1      2      Bob      3  Charlie
2      3  Charlie      4    David
3      4    David      5  Praveen

Outer Join:
   A_df1    B_df1  A_df2     B_df2
0    1.0    Alice      2       Bob
1    2.0      Bob      3   Charlie
2    3.0  Charlie      4     David
3    4.0    David      5   Praveen
4    NaN      NaN      6     Rahul
5    NaN      NaN      7  Priyanka


# Append

In [10]:
import pandas as pd

# Creating the first DataFrame
df1 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': ['Alice', 'Bob', 'Charlie', 'David']
})

# Creating the second DataFrame
df2 = pd.DataFrame({
    'C': [2, 3, 4, 5, 6, 7],
    'D': ['Bob', 'Charlie', 'David', 'Praveen', 'Rahul', 'Priyanka']
})

# Concatenating df2 below df1
result = pd.concat([df1, df2], ignore_index=True)

# Displaying the result
print(result)


     A        B    C         D
0  1.0    Alice  NaN       NaN
1  2.0      Bob  NaN       NaN
2  3.0  Charlie  NaN       NaN
3  4.0    David  NaN       NaN
4  NaN      NaN  2.0       Bob
5  NaN      NaN  3.0   Charlie
6  NaN      NaN  4.0     David
7  NaN      NaN  5.0   Praveen
8  NaN      NaN  6.0     Rahul
9  NaN      NaN  7.0  Priyanka


# Lec 14 : Python Pandas Tutorial - Pivot Table and Melt Function - Explained
![image.png](attachment:395d5320-0e47-44b9-9d95-a9639ab43f4b.png)

# Melt()

In [11]:
import pandas as pd

df_score = pd.DataFrame({
    "Days": [1, 2, 3, 4, 5, 6],
    "English": [10, 12, 14, 15, 12, 10],
    "Maths": [17, 18, 16, 12, 19, 20]
})

# Reshaping the DataFrame using melt
melted_df = pd.melt(df_score, id_vars=["Days"], var_name="Subject", value_name="Score")

# Displaying the result
print(melted_df)


    Days  Subject  Score
0      1  English     10
1      2  English     12
2      3  English     14
3      4  English     15
4      5  English     12
5      6  English     10
6      1    Maths     17
7      2    Maths     18
8      3    Maths     16
9      4    Maths     12
10     5    Maths     19
11     6    Maths     20


# Pivot()

In [12]:
df_score = pd.DataFrame({
    "Days" : [1,2,3,4,5,6],
    "Student_Name" : ['Bob', 'Charlie', 'David', 'Praveen', 'Rahul', 'Priyanka'],
    "English" : [10,12,14,15,12,10],
    "Maths" : [17,18,16,12,19,20]
})

df_score

Unnamed: 0,Days,Student_Name,English,Maths
0,1,Bob,10,17
1,2,Charlie,12,18
2,3,David,14,16
3,4,Praveen,15,12
4,5,Rahul,12,19
5,6,Priyanka,10,20


In [13]:
df_score.pivot(index = "Days" , columns="Student_Name" )

Unnamed: 0_level_0,English,English,English,English,English,English,Maths,Maths,Maths,Maths,Maths,Maths
Student_Name,Bob,Charlie,David,Praveen,Priyanka,Rahul,Bob,Charlie,David,Praveen,Priyanka,Rahul
Days,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
1,10.0,,,,,,17.0,,,,,
2,,12.0,,,,,,18.0,,,,
3,,,14.0,,,,,,16.0,,,
4,,,,15.0,,,,,,12.0,,
5,,,,,,12.0,,,,,,19.0
6,,,,,10.0,,,,,,20.0,


In [14]:
df_score.pivot(index = "Days" , columns="Student_Name", values="English" )

Student_Name,Bob,Charlie,David,Praveen,Priyanka,Rahul
Days,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,10.0,,,,,
2,,12.0,,,,
3,,,14.0,,,
4,,,,15.0,,
5,,,,,,12.0
6,,,,,10.0,


In [15]:
df_score = pd.DataFrame({
    "Days" : [1,1,1,1,2,2],
    "Student_Name" : ['Bob', 'Charlie', 'Bob', 'Charlie', 'Bob', 'Charlie'],
    "English" : [10,12,14,15,12,10],
    "Maths" : [17,18,16,12,19,20]
})

df_score

Unnamed: 0,Days,Student_Name,English,Maths
0,1,Bob,10,17
1,1,Charlie,12,18
2,1,Bob,14,16
3,1,Charlie,15,12
4,2,Bob,12,19
5,2,Charlie,10,20


In [16]:
df_score.pivot_table(index="Student_Name" , columns="Days" , aggfunc = "mean")

Unnamed: 0_level_0,English,English,Maths,Maths
Days,1,2,1,2
Student_Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Bob,12.0,12.0,16.5,19.0
Charlie,13.5,10.0,15.0,20.0


In [17]:
df_score.pivot_table(index="Student_Name" , columns="Days" , aggfunc = "sum")

Unnamed: 0_level_0,English,English,Maths,Maths
Days,1,2,1,2
Student_Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Bob,24,12,33,19
Charlie,27,10,30,20


In [18]:
df_score.pivot_table(index="Student_Name" , columns="Days" , aggfunc = "median")

Unnamed: 0_level_0,English,English,Maths,Maths
Days,1,2,1,2
Student_Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Bob,12.0,12.0,16.5,19.0
Charlie,13.5,10.0,15.0,20.0


In [19]:
df_score.pivot_table(index="Student_Name" , columns="Days" , aggfunc = "sum", margins="True")

Unnamed: 0_level_0,English,English,English,Maths,Maths,Maths
Days,1,2,All,1,2,All
Student_Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Bob,24,12,36,33,19,52
Charlie,27,10,37,30,20,50
All,51,22,73,63,39,102


In [20]:
df_score.pivot_table(index="Student_Name" , columns="Days" , aggfunc = "mean", margins="True")

Unnamed: 0_level_0,English,English,English,Maths,Maths,Maths
Days,1,2,All,1,2,All
Student_Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Bob,12.0,12.0,12.0,16.5,19.0,17.333333
Charlie,13.5,10.0,12.333333,15.0,20.0,16.666667
All,12.75,11.0,12.166667,15.75,19.5,17.0


## Code Basics

In [23]:
import pandas as pd

df= pd.read_csv("cb.csv")
df

Unnamed: 0,Date,City,Temperature,Humidity
0,05/01/17,New York,86,59
1,05/02/17,New York,54,67
2,05/03/17,New York,59,80
3,05/01/17,Mumbai,66,75
4,05/02/17,Mumbai,60,54
5,05/03/17,Mumbai,84,82
6,05/01/17,Beijing,73,64
7,05/02/17,Beijing,64,88
8,05/03/17,Beijing,73,68


In [25]:
df.pivot(index ="Date" , columns= "City")

Unnamed: 0_level_0,Temperature,Temperature,Temperature,Humidity,Humidity,Humidity
City,Beijing,Mumbai,New York,Beijing,Mumbai,New York
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
05/01/17,73,66,86,64,75,59
05/02/17,64,60,54,88,54,67
05/03/17,73,84,59,68,82,80


In [26]:
df.pivot(index ="Date" , columns= "City" , values= "Humidity")

City,Beijing,Mumbai,New York
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
05/01/17,64,75,59
05/02/17,88,54,67
05/03/17,68,82,80


In [27]:
df.pivot(index ="Humidity" , columns= "City" )

Unnamed: 0_level_0,Date,Date,Date,Temperature,Temperature,Temperature
City,Beijing,Mumbai,New York,Beijing,Mumbai,New York
Humidity,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
54,,05/02/17,,,60.0,
59,,,05/01/17,,,86.0
64,05/01/17,,,73.0,,
67,,,05/02/17,,,54.0
68,05/03/17,,,73.0,,
75,,05/01/17,,,66.0,
80,,,05/03/17,,,59.0
82,,05/03/17,,,84.0,
88,05/02/17,,,64.0,,


# Pivot Table

In [29]:
df.pivot_table(index="City" , columns="Date")

Unnamed: 0_level_0,Humidity,Humidity,Humidity,Temperature,Temperature,Temperature
Date,05/01/17,05/02/17,05/03/17,05/01/17,05/02/17,05/03/17
City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Beijing,64.0,88.0,68.0,73.0,64.0,73.0
Mumbai,75.0,54.0,82.0,66.0,60.0,84.0
New York,59.0,67.0,80.0,86.0,54.0,59.0


In [42]:
df.pivot_table(index="City" , columns = "Date",margins=True )

Unnamed: 0_level_0,Humidity,Humidity,Humidity,Humidity,Temperature,Temperature,Temperature,Temperature
Date,05/01/17,05/02/17,05/03/17,All,05/01/17,05/02/17,05/03/17,All
City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Beijing,64.0,88.0,68.0,73.333333,73.0,64.0,73.0,70.0
Mumbai,75.0,54.0,82.0,70.333333,66.0,60.0,84.0,70.0
New York,59.0,67.0,80.0,68.666667,86.0,54.0,59.0,66.333333
All,66.0,69.666667,76.666667,70.777778,75.0,59.333333,72.0,68.777778


In [41]:
df.pivot_table(index="City" , columns = "Date",aggfunc= 'cumsum' )

Unnamed: 0,Humidity,Temperature
0,59,86
1,67,54
2,80,59
3,75,66
4,54,60
5,82,84
6,64,73
7,88,64
8,68,73
