# Pandas Series, DataFrames, and Indexes Coding Practice Questions

*N.B You will have to create custom data yourself in order to run the below questions. Also read the question carefully to help you solve them. See the API documentation if all else fails.*

https://pandas.pydata.org/docs/reference/index.html#api

1. Create a Pandas Series containing 10 random integers between 1 and 100.

In [1]:
import numpy as np
import pandas as pd

In [None]:
arr = np.random.randint(1, 101, 10)
ser = pd.Series(arr)
ser

0     29
1    100
2    100
3     54
4     65
5     97
6     79
7     18
8     55
9     97
dtype: int64

2. Convert the following list of dictionaries into a Pandas DataFrame:
```python
data = [
    {'name': 'Alice', 'age': 28, 'city': 'New York'},
    {'name': 'Bob', 'age': 22, 'city': 'London'},
    {'name': 'Charlie', 'age': 30, 'city': 'Paris'}
]
```

In [None]:
data = [
 {'name': 'Alice', 'age': 28, 'city': 'New York'},
 {'name': 'Bob', 'age': 22, 'city': 'London'},
 {'name': 'Charlie', 'age': 30, 'city': 'Paris'}
]
df_data = pd.DataFrame(data)
df_data

Unnamed: 0,name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


3. Create a DataFrame with 5 rows and 3 columns filled with random numbers between 0 and 1.

In [None]:
df = pd.DataFrame(np.random.randint(0, 2, (5, 3)))
df

Unnamed: 0,0,1,2
0,0,0,0
1,1,1,1
2,0,0,0
3,1,0,0
4,1,1,0


4. Given a DataFrame, select the column named 'age'.

In [None]:
df_data['age']

0    28
1    22
2    30
Name: age, dtype: int64

5. Given a DataFrame, select the rows where the 'age' column is greater than 25.

In [None]:
df_data[df_data['age'] > 25]

Unnamed: 0,name,age,city
0,Alice,28,New York
2,Charlie,30,Paris


6. Create a Pandas Series with an index made of dates from '2023-01-01' to '2023-01-10'.

In [None]:
arr = np.random.randint(0, 10, 10)
ind = ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10']
pd.Series(arr, index=ind)

2023-01-01    3
2023-01-02    5
2023-01-03    0
2023-01-04    1
2023-01-05    5
2023-01-06    3
2023-01-07    0
2023-01-08    4
2023-01-09    2
2023-01-10    0
dtype: int64

In [None]:
# Formal solution
# pd.date_range('2023-01-01', periods=10)
date_series = pd.Series(np.arange(10), index=pd.date_range('2023-01-01', periods=10))
date_series

2023-01-01    0
2023-01-02    1
2023-01-03    2
2023-01-04    3
2023-01-05    4
2023-01-06    5
2023-01-07    6
2023-01-08    7
2023-01-09    8
2023-01-10    9
Freq: D, dtype: int64

In [None]:
pd.date_range?

7. Given a DataFrame, set the column named 'name' as the index.

In [None]:
df_data['name']

0      Alice
1        Bob
2    Charlie
Name: name, dtype: object

In [None]:
df_indexed = df_data.set_index('name')
df_indexed

Unnamed: 0_level_0,age,city
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,28,New York
Bob,22,London
Charlie,30,Paris


8. Given a DataFrame, reset its index.

In [None]:
df_data.reset_index()

Unnamed: 0,index,name,age,city
0,0,Alice,28,New York
1,1,Bob,22,London
2,2,Charlie,30,Paris


9. Given a DataFrame, rename the column 'name' to 'first_name'.

In [None]:
df_data.rename(columns={'name': 'first_name'}, inplace=True)
df_data

Unnamed: 0,first_name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


10. Given a DataFrame, drop the column named 'city'.

In [None]:
df_drop = df_data.drop('city', axis=1)
df_drop

Unnamed: 0,first_name,age
0,Alice,28
1,Bob,22
2,Charlie,30


11. Create a DataFrame from the following CSV string:
```csv
name,age,city
Alice,28,New York
Bob,22,London
Charlie,30,Paris
```

In [None]:
from io import StringIO

# CSV string
csv_data = """
name,age,city
Alice,28,New York
Bob,22,London
Charlie,30,Paris
"""

# Use StringIO to simulate a file object
csv_file = StringIO(csv_data)

df = pd.read_csv(csv_file)
df

Unnamed: 0,name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


12. Given a DataFrame, save it to a CSV file named 'data.csv'.

In [None]:
df.to_csv('data.csv', index=False)

13. Given a DataFrame, filter the rows where the 'age' column is between 20 and 30 (inclusive).

In [None]:
df[(df['age'] >= 20) & (df['age'] <= 30)]

Unnamed: 0,name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


14. Given a DataFrame, calculate the mean of the 'age' column.

In [None]:
df['age'].mean()

26.666666666666668

15. Given a DataFrame, sort it by the 'age' column in descending order.

In [None]:
df.sort_values(by='age', ascending=False)

Unnamed: 0,name,age,city
2,Charlie,30,Paris
0,Alice,28,New York
1,Bob,22,London


16. Given a DataFrame, group it by the 'city' column and calculate the mean age for each city.

In [None]:
df.groupby(by='city').age.mean()

city
London      22.0
New York    28.0
Paris       30.0
Name: age, dtype: float64

17. Given a DataFrame, create a new column named 'is_adult' that contains `True` if the age is 18 or above, and `False` otherwise.

In [None]:
is_adult = df
is_adult['is_adult'] = (is_adult['age'] >= 18)
is_adult

Unnamed: 0,name,age,city,is_adult
0,Alice,28,New York,True
1,Bob,22,London,True
2,Charlie,30,Paris,True


18. Given a DataFrame, fill any missing values in the 'age' column with the mean age.

In [2]:
from io import StringIO

# CSV string
csv_data = """
name,age,city
Alice,28,New York
Bob,22,London
Charlie,30,Paris
Wayne,,Gotham
"""

# Use StringIO to simulate a file object
csv_file = StringIO(csv_data)

df = pd.read_csv(csv_file)
df['age'].fillna(df['age'].mean().round(2), inplace=True)
df

Unnamed: 0,name,age,city
0,Alice,28.0,New York
1,Bob,22.0,London
2,Charlie,30.0,Paris
3,Wayne,26.67,Gotham


19. Given a DataFrame, drop any rows that have missing values.

In [3]:
# Create a new row as a dictionary
new_row = {'name': 'Eve', 'age': np.nan, 'city': 'Los Angeles'}
df = df.append(new_row, ignore_index=True)
print(df)

df.dropna()

      name    age         city
0    Alice  28.00     New York
1      Bob  22.00       London
2  Charlie  30.00        Paris
3    Wayne  26.67       Gotham
4      Eve    NaN  Los Angeles


  df = df.append(new_row, ignore_index=True)


Unnamed: 0,name,age,city
0,Alice,28.0,New York
1,Bob,22.0,London
2,Charlie,30.0,Paris
3,Wayne,26.67,Gotham


20. Given a DataFrame, find the row with the maximum value in the 'age' column.

In [None]:
df.max()

name    Wayne
age      30.0
city    Paris
dtype: object

In [None]:
df['age'].max()

30.0

In [None]:
df[df['age'] == df['age'].max()]

Unnamed: 0,name,age,city
2,Charlie,30.0,Paris
