# Data Programming in Python | BAIS:6040
# Pandas Basics - Exercise Solutions

## Exercises for selecting elements from a series (5 questions)

In [None]:
data = np.arange(10)
index = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]

series = pd.Series(data=data, index=index)
series

1\. Get the first element in <i>series</i>. 

In [None]:
# Your answer here
series[0]

2\. Get the last 3 elements in <i>series</i>. 

In [None]:
# Your answer here
series[-3:]

3\. Get the element in <i>series</i> that the index label 'c' refers to.

In [None]:
# Your answer here
series["c"]

4\. Get the elements in <i>series</i> that the index labels 'a', 'c', and 'e' refer to.

In [None]:
# Your answer here
series[["a", "c", "e"]]

5\. Print all elements in <i>series</i> with a tab between elements.

In [None]:
# Your answer here
for num in series:
    print(num, end="\t")

## Exercises for selecting elements from a dataframe (15 questions)

Let's continue to use the Titanic dataset but with another samples this time.

In [None]:
df = load_dataset("titanic")
df = df[["survived", "pclass", "sex", "age", "fare"]].sample(n=20, replace=False, random_state=2)
df

1\. Get the columns of <i>df</i>. 

In [None]:
# Your answer here
df.columns

2\. Get the index of <i>df</i>. 

In [None]:
# Your answer here
df.index

3\. Get the shape of <i>df</i>. 

In [None]:
# Your answer here
df.shape

4\. Get the number of rows, or records, in <i>df</i>. 

In [None]:
# Your answer here
len(df)

5\. Select all rows under the column <i>survived</i>. 

In [None]:
# Your answer here
df.survived  # or df['survived']

6\. Select the first 3 rows. 

In [None]:
# Your answer here
df[:3]               # Or df.head(3)

7\. Select the last 3 rows.

In [None]:
# Your answer here
df[-3:]               # Or df.tail(3)

8\. Select the element with row index number 615 under the column <i>fare</i>.

In [None]:
# Your answer here
df["fare"][615]

9\. Select the element in the third row and the fifth column.

In [None]:
# Your answer here
df.iloc[2, 4]

10\. Select all rows under the last column, not specifying the column label.

In [None]:
# Your answer here
df.iloc[:, -1]

11\. Select all rows under the last 2 columns, not specifying the column labels.

In [None]:
# Your answer here
df.iloc[:, -2:]

12\. Select the first 5 rows under the last 2 columns, not specifying the column labels.

In [None]:
# Your answer here
df.iloc[:5, -2:]

13\. Select all rows with their column <i>sex</i> being male.

In [None]:
# Your answer here
df[df.sex == "male"]

14\. Select all rows with the column <i>fare</i> being between 50 (inclusive) and 100 (exclusive).

In [None]:
# Your answer here
df[(df.fare >= 50) & (df.fare < 100)]

15\. Print all values under the column <i>fare</i> with a tab between the values. 

In [None]:
# Your answer here
for num in df.fare:
    print(num, end="\t")

## Exercises for handling null values (6 questions)

Suppose you have <i>df1</i>, <i>df2</i>, <i>df3</i>, <i>df4</i>, and <i>df5</i>, each of which is a copy of <i>df</i>.

In [None]:
df = load_dataset("titanic")
df = df[["survived", "pclass", "sex", "age", "fare"]].sample(n=10, replace=False, random_state=4)
df.iloc[1, 0] = None
df.iloc[7, 1] = None
df.iloc[3, 4] = None
df.iloc[[5, 9], :] = None

df

In [None]:
df1, df2, df3, df4, df5 = df.copy(), df.copy(), df.copy(), df.copy(), df.copy()

1\. Select all rows in <i>df</i> with any null values.

In [None]:
# Your answer here
df[df.isnull().any(axis=1)]

2\. Drop the rows in <i>df1</i> that have any null values. Make sure to assign the resulting dataframe back to <i>df1</i> to actually change <i>df1</i>.

In [None]:
# Your answer here
df1 = df1.dropna()
df1

3\. Drop the rows in <i>df2</i> in which all values are null. Make sure to assign the resulting dataframe back to <i>df2</i> to actually change <i>df1</i>.

In [None]:
# Your answer here
df2 = df2.dropna(how="all")
df2

4\. Fill the missing values under the column <i>sex</i> in <i>df3</i> with 'unknown'.

In [None]:
# Your answer here
df3.sex = df3.sex.fillna(value="unknown")
df3

5\. Fill the missing values under the columns <i>pclass</i>, <i>age</i>, and <i>fare</i> in <i>df4</i> with the minimum value of their column.

In [None]:
# Your answer here
df4 = df4.fillna(value={"pclass": df4.pclass.min(), "age": df4.age.min(), "fare": df4.fare.min()})
df4

6\. Fill all missing values in <i>df5</i> with the last non-null oberservation backward.

In [None]:
# Your answer here
df5 = df5.fillna(method="bfill")
df5

## Exercises for aggregation and grouping (8 questions)

Let's continue to use the entire Titanic datafame <i>df</i>.

1\. What was the highest fare?

In [None]:
# Your answer here
df.fare.max()

2\. What was the lowest fare?

In [None]:
# Your answer here
df.fare.min()

3\. What was the mean age of the female passengers? (Select the female passengers first.)

In [None]:
# Your answer here
df[df.sex == "female"].age.mean()

4\. Select the rows, or passengers, who were under the age of ten and died? (Put two conditions for filtering.)

In [None]:
# Your answer here
df[(df.age <= 10) & (df.survived == 0)]

5\. What were the mean ages for those who survived and who died, respectively? In other words, group the dataframe by <i>survived</i> and get the mean age of each group. 

In [None]:
# Your answer here
df.groupby("survived").age.mean()

6\. Group the dataframe by <i>survived</i> and then by <i>pclass</i> and get the mean fare of each group. 

In [None]:
# Your answer here
df.groupby(["survived", "pclass"]).fare.mean()

7\. Get a copy of <i>df</i> in which the entire dataframe is sorted by <i>age</i> and then by <i>pclass</i> in descending order, respectively. 

In [None]:
# Your answer here
df.sort_values(by=["age", "pclass"], ascending=[False,True])

8\. Create a random sample of 50 rows with no duplicates from <i>df</i>. 

In [None]:
# Your answer here
df.sample(n=50, replace=False)