# Pandas Series, DataFrames, and Indexes Coding Practice Questions

*N.B You will have to create custom data yourself in order to run the below questions.*

https://pandas.pydata.org/docs/reference/index.html#api

1. Create a Pandas Series containing 10 random integers between 1 and 100.

2. Convert the following list of dictionaries into a Pandas DataFrame:
```python
data = [
    {'name': 'Alice', 'age': 28, 'city': 'New York'},
    {'name': 'Bob', 'age': 22, 'city': 'London'},
    {'name': 'Charlie', 'age': 30, 'city': 'Paris'}
]
```

3. Create a DataFrame with 5 rows and 3 columns filled with random numbers between 0 and 1.

4. Given a DataFrame, select the column named 'age'.

5. Given a DataFrame, select the rows where the 'age' column is greater than 25.

6. Create a Pandas Series with an index made of dates from '2023-01-01' to '2023-01-10'.

7. Given a DataFrame, set the column named 'name' as the index.

8. Given a DataFrame, reset its index.

9. Given a DataFrame, rename the column 'name' to 'first_name'.

10. Given a DataFrame, drop the column named 'city'.

11. Create a DataFrame from the following CSV string:
```csv
name,age,city
Alice,28,New York
Bob,22,London
Charlie,30,Paris
```

12. Given a DataFrame, save it to a CSV file named 'data.csv'.

13. Given a DataFrame, filter the rows where the 'age' column is between 20 and 30 (inclusive).

14. Given a DataFrame, calculate the mean of the 'age' column.

15. Given a DataFrame, sort it by the 'age' column in descending order.

16. Given a DataFrame, group it by the 'city' column and calculate the mean age for each city.

17. Given a DataFrame, create a new column named 'is_adult' that contains `True` if the age is 18 or above, and `False` otherwise.

18. Given a DataFrame, fill any missing values in the 'age' column with the mean age.

19. Given a DataFrame, drop any rows that have missing values.

20. Given a DataFrame, find the row with the maximum value in the 'age' column.

# Solutions to Pandas Series, DataFrames, and Indexes Coding Practice Questions

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Solution to Question 1
series_1 = pd.Series(np.random.randint(1, 100, 10))
series_1

0    76
1    23
2    52
3     8
4     9
5    49
6    62
7    85
8    82
9    96
dtype: int64

In [3]:
# Solution to Question 2
data = [
    {'name': 'Alice', 'age': 28, 'city': 'New York'},
    {'name': 'Bob', 'age': 22, 'city': 'London'},
    {'name': 'Charlie', 'age': 30, 'city': 'Paris'}
]
df = pd.DataFrame(data)
df

Unnamed: 0,name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


In [4]:
# Solution to Question 3
df_random = pd.DataFrame(np.random.rand(5, 3), columns=['A', 'B', 'C'])
df_random

Unnamed: 0,A,B,C
0,0.221465,0.322668,0.739875
1,0.611399,0.98443,0.464342
2,0.932359,0.024154,0.12763
3,0.550092,0.525501,0.024024
4,0.150038,0.184204,0.728092


In [5]:
# Solution to Question 4
age_column = df['age']
age_column

0    28
1    22
2    30
Name: age, dtype: int64

In [6]:
# Solution to Question 5
filtered_rows = df[df['age'] > 25]
filtered_rows

Unnamed: 0,name,age,city
0,Alice,28,New York
2,Charlie,30,Paris


In [7]:
# Solution to Question 6
date_series = pd.Series(np.arange(10), index=pd.date_range('2023-01-01', periods=10))
date_series

2023-01-01    0
2023-01-02    1
2023-01-03    2
2023-01-04    3
2023-01-05    4
2023-01-06    5
2023-01-07    6
2023-01-08    7
2023-01-09    8
2023-01-10    9
Freq: D, dtype: int64

In [8]:
# Solution to Question 7
df_indexed = df.set_index('name')
df_indexed

Unnamed: 0_level_0,age,city
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,28,New York
Bob,22,London
Charlie,30,Paris


In [9]:
# Solution to Question 8
df_reset = df_indexed.reset_index()
df_reset

Unnamed: 0,name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


In [10]:
# Solution to Question 9
df_renamed = df.rename(columns={'name': 'first_name'})
df_renamed

Unnamed: 0,first_name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


In [11]:
# Solution to Question 10
df_dropped = df.drop(columns=['city'])
df_dropped

Unnamed: 0,name,age
0,Alice,28
1,Bob,22
2,Charlie,30


In [12]:
# Solution to Question 11
from io import StringIO
csv_data = """
name,age,city
Alice,28,New York
Bob,22,London
Charlie,30,Paris
"""
df_from_csv = pd.read_csv(StringIO(csv_data))
df_from_csv

Unnamed: 0,name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


In [13]:
# Solution to Question 12
df.to_csv('data.csv', index=False)

In [14]:
# Solution to Question 13
filtered_age = df[(df['age'] >= 20) & (df['age'] <= 30)]
filtered_age

Unnamed: 0,name,age,city
0,Alice,28,New York
1,Bob,22,London
2,Charlie,30,Paris


In [15]:
# Solution to Question 14
mean_age = df['age'].mean()
mean_age

26.666666666666668

In [16]:
# Solution to Question 15
sorted_df = df.sort_values(by='age', ascending=False)
sorted_df

Unnamed: 0,name,age,city
2,Charlie,30,Paris
0,Alice,28,New York
1,Bob,22,London


In [17]:
# Solution to Question 16
grouped = df.groupby('city').age.mean()
grouped

city
London      22.0
New York    28.0
Paris       30.0
Name: age, dtype: float64

In [18]:
# Solution to Question 17
df['is_adult'] = df['age'] >= 18
df

Unnamed: 0,name,age,city,is_adult
0,Alice,28,New York,True
1,Bob,22,London,True
2,Charlie,30,Paris,True


In [19]:
# Solution to Question 18
df['age'].fillna(df['age'].mean(), inplace=True)
df

Unnamed: 0,name,age,city,is_adult
0,Alice,28,New York,True
1,Bob,22,London,True
2,Charlie,30,Paris,True


In [20]:
# Solution to Question 19
df_dropped_na = df.dropna()
df_dropped_na

Unnamed: 0,name,age,city,is_adult
0,Alice,28,New York,True
1,Bob,22,London,True
2,Charlie,30,Paris,True


In [21]:
# Solution to Question 20
max_age_row = df[df['age'] == df['age'].max()]
max_age_row

Unnamed: 0,name,age,city,is_adult
2,Charlie,30,Paris,True
