**Q1. How do you load a CSV file into a Pandas DataFrame?**

**Ans)** By using command - `pd.read_csv()`

**Q2. How do you check the data type of a column in a Pandas DataFrame?**

**Ans)** `df['column name'].dtype`

where df - dataframe

**Q3. How do you select rows from a Pandas DataFrame based on a condition?**

**Ans)** `df.loc[df['column_name'] >= 'some condition']  `

**Q4. How do you rename columns in a Pandas DataFrame?**

**Ans)** `df.rename(columns = {'old_column_name':'new_column_name'}, inplace = True)`

**rename all columns name**

`df.rename = ['colname_1, colname_2,...colname_n]`

**Q5. How do you drop columns in a Pandas DataFrame?**

**Ans)** `df.drop(columns='col_name', axis =1, inplace = True)`

**Q6. How do you find the unique values in a column of a Pandas DataFrame?**

**Ans)** `df['column_name'].unique()`

**Q7. How do you find the number of missing values in each column of a Pandas DataFrame?**

**Ans)** `df.isna().sum()` or `df.isnull().sum()`

In [73]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 15],
                   'assists': [5, 7, None, 7],
                   'rebounds': [None, 8, 6, None]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5.0,
1,A,12,7.0,8.0
2,B,15,,6.0
3,B,15,7.0,


In [74]:
df.isna().sum()

team        0
points      0
assists     1
rebounds    2
dtype: int64

**Q8. How do you fill missing values in a Pandas DataFrame with a specific value?**

**Ans)** `df.fillna(0)`

Here all Nan values is filled with 0

In [75]:
df.fillna(0)

Unnamed: 0,team,points,assists,rebounds
0,A,25,5.0,0.0
1,A,12,7.0,8.0
2,B,15,0.0,6.0
3,B,15,7.0,0.0


**Q9. How do you concatenate two Pandas DataFrames?**

**Ans)** `df3 = pd.concat([df1, df2], axis =0)`

In [68]:
import pandas as pd
df1 = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 14],
                   'assists': [5, 7, 7, 9],
                   'rebounds': [11, 8, 9, 6]})

df2 = pd.DataFrame({'team':['B', 'B', 'B', 'B'],
                   'points': [19, 23, 25, 29],
                   'assists': [12, 9, 9, 4],
                   'rebounds': [6, 5, 9 ,12]})

display(df1)
display(df2)

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,9
3,B,14,9,6


Unnamed: 0,team,points,assists,rebounds
0,B,19,12,6
1,B,23,9,5
2,B,25,9,9
3,B,29,4,12


In [70]:
df3 = pd.concat([df1,df2])
df3

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,9
3,B,14,9,6
0,B,19,12,6
1,B,23,9,5
2,B,25,9,9
3,B,29,4,12


In [71]:
df3 = pd.concat([df1,df2], axis = 1)
df3

Unnamed: 0,team,points,assists,rebounds,team.1,points.1,assists.1,rebounds.1
0,A,25,5,11,B,19,12,6
1,A,12,7,8,B,23,9,5
2,B,15,7,9,B,25,9,9
3,B,14,9,6,B,29,4,12


**Q10. How do you merge two Pandas DataFrames on a specific column?**

**Ans)** `df3 = pd.merge(df1, df2, on = 'column_name', how = 'outer')`

If we want other type of join like inner, left or right we can change parameter of how

**Q11. How do you group data in a Pandas DataFrame by a specific column and apply an aggregation function?**

**Ans)** `df.groupby('column_name').sum()`

In [38]:
df.groupby('team').agg(
            min_point=("points", "min"),
            max_point=("points", "max"),
            sum_point =("points", "sum"),
            count_point = ("points", "count"))

Unnamed: 0_level_0,min_point,max_point,sum_point,count_point
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,12,25,66,4
B,19,29,96,4


**Q12. How do you pivot a Pandas DataFrame?**

**Ans)** `DataFrame.pivot(index=None, columns=None, values=None)`

Return reshaped DataFrame organized by given index / column values.

In [83]:
import pandas as pd

df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 15],
                   'assists': [5, 7, 7, 7],
                   'rebounds': [11, 8, 6, 6]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,6
3,B,15,7,6


In [85]:
df.pivot(columns='team', values=['points', 'assists'])

Unnamed: 0_level_0,points,points,assists,assists
team,A,B,A,B
0,25.0,,5.0,
1,12.0,,7.0,
2,,15.0,,7.0
3,,15.0,,7.0


**Q13. How do you change the data type of a column in a Pandas DataFrame?**

**Ans)** `df['column_name'].astype(dtype)`

In [79]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 13, 15],
                   'assists': [5, 7, 7, 9],
                   'rebounds': [11, 8, 6, 7]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,13,7,6
3,B,15,9,7


In [81]:
df['rebounds'].dtype

dtype('int64')

In [82]:
df['rebounds'].astype('float')

0    11.0
1     8.0
2     6.0
3     7.0
Name: rebounds, dtype: float64

**Q14. How do you sort a Pandas DataFrame by a specific column?**

**Ans)** `df.sort_values(by = 'column_name', inplace = True)`

**Q15. How do you create a copy of a Pandas DataFrame?**

**Ans)** `df_copy = df.copy()`

**Q16. How do you filter rows of a Pandas DataFrame by multiple conditions?**

**Ans)** `df[condition1 & condition2 &....& condition_n]`

`df[condition1 | condition2 |....| condition_n]`

here 

**& --> AND operator** 

**| --> OR operator**

In [87]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 15],
                   'assists': [5, 7, 7, 7],
                   'rebounds': [11, 8, 6, 6]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,6
3,B,15,7,6


In [89]:
df[(df['points'] >=15) & (df['rebounds'] > 10)]

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11


In [90]:
df[(df['points'] >=15) | (df['rebounds'] > 10)]

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
2,B,15,7,6
3,B,15,7,6


**Q17. How do you calculate the mean of a column in a Pandas DataFrame?**

**Ans)** `df['column'].mean()`

In [76]:
import pandas as pd
import numpy as np
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 15],
                   'assists': [5, 7, 7, 7],
                   'rebounds': [11, 8, 6, 6]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,6
3,B,15,7,6


In [77]:
df['points'].mean()

16.75

**Q18. How do you calculate the standard deviation of a column in a Pandas DataFrame?**

**Ans)** `df['column_name'].std()`

In [78]:
df['points'].std()

5.678908345800274

**Q19. How do you calculate the correlation between two columns in a Pandas DataFrame?**

**Ans)** `df['column name'].corr(df['column name'])`

In [91]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 15],
                   'assists': [5, 7, 7, 7],
                   'rebounds': [11, 8, 6, 6]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,6
3,B,15,7,6


In [92]:
df['points'].corr(df['assists'])

-0.9684959969581862

**Q20. How do you select specific columns in a DataFrame using their labels?**

In [93]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 15],
                   'assists': [5, 7, 7, 7],
                   'rebounds': [11, 8, 6, 6]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,6
3,B,15,7,6


In [95]:
df[['team','points']]

Unnamed: 0,team,points
0,A,25
1,A,12
2,B,15
3,B,15


In [99]:
df.loc[:,['team', 'assists']]

Unnamed: 0,team,assists
0,A,5
1,A,7
2,B,7
3,B,7


**Q21. How do you select specific rows in a DataFrame using their indexes?**

**Ans)**

In [100]:
df.loc[1:3,:]

Unnamed: 0,team,points,assists,rebounds
1,A,12,7,8
2,B,15,7,6
3,B,15,7,6


**Q22. How do you sort a DataFrame by a specific column?**

**Ans)** `df.sort_values(by = 'column_name', inplace = True)`

In [65]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 13, 15],
                   'assists': [5, 7, 7, 9],
                   'rebounds': [11, 8, 6, 7]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,13,7,6
3,B,15,9,7


In [66]:
df.sort_values(by = 'points')

Unnamed: 0,team,points,assists,rebounds
1,A,12,7,8
2,B,13,7,6
3,B,15,9,7
0,A,25,5,11


**Q23. How do you create a new column in a DataFrame based on the values of another column?**

In [67]:
df['total'] = df['points'] + df['assists'] + df['rebounds']

df

Unnamed: 0,team,points,assists,rebounds,total
0,A,25,5,11,41
1,A,12,7,8,27
2,B,13,7,6,26
3,B,15,9,7,31


**Q24. How do you remove duplicates from a DataFrame?**

**Ans)** By using command **`drop_duplicates`**

In [57]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 15],
                   'assists': [5, 7, 7, 7],
                   'rebounds': [11, 8, 6, 6]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,6
3,B,15,7,6


In [58]:
df.drop_duplicates(keep='first',inplace=True)

In [59]:
df

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,6


**Q25. What is the difference between .loc and .iloc in Pandas?**

**Ans)** The main difference between pandas `loc[] vs iloc[]` is **`loc[]`** gets DataFrame rows & columns by labels/names and **`iloc[]`** gets by integer Index/position.

Example

In [39]:
import pandas as pd
df = pd.DataFrame({'team':['A', 'A', 'B', 'B'],
                   'points': [25, 12, 15, 14],
                   'assists': [5, 7, 7, 9],
                   'rebounds': [11, 8, 9, 6]})

df.head()

Unnamed: 0,team,points,assists,rebounds
0,A,25,5,11
1,A,12,7,8
2,B,15,7,9
3,B,14,9,6


In [44]:
## We can use loc to select specific rows of the DataFrame based on their index labels:
df.loc[1]

team         A
points      12
assists      7
rebounds     8
Name: 1, dtype: object

In [45]:
## We can use loc to select specific rows of the DataFrame based on slicing of index labels:
df.loc[2:4]

Unnamed: 0,team,points,assists,rebounds
2,B,15,7,9
3,B,14,9,6


In [49]:
## We can use loc to select specific rows and specific columns of the DataFrame based on their labels:

df.loc[[1,3],['team','points']]

Unnamed: 0,team,points
1,A,12
3,B,14


In [56]:
## We can use iloc to select specific rows of the DataFrame based on their index labels:
df.iloc[2]

team         B
points      15
assists      7
rebounds     9
Name: 2, dtype: object

In [52]:
## We can use iloc to select specific rows and specific columns of the DataFrame based on their labels:

df.iloc[0:3,0:2]

Unnamed: 0,team,points
0,A,25
1,A,12
2,B,15
