# Exercise Set 2: Practicing Numpy and Pandas Basics
This exercise should take approximately 30 minutes to complete. Work through each part and run the code blocks to test your understanding.
We will cover:
- Importing Numpy and Pandas
- Creating Numpy arrays
- Creating Pandas DataFrames manually
- Basic filtering using datetime columns

## 1. Import the Packages
Import `numpy` as `np` and `pandas` as `pd`.

In [11]:
# Your code here
import numpy as np
import pandas as pd

## 2. Numpy Arrays

The Numpy function ```np.arange()``` creates a range of numbers. You define the range by providing the start and end point of the range. If you want, you can additionally indicate a step size. Example:

In [12]:
print(np.arange(1,4))
# Running np.arange with arguments 1 and 4 creates a list of numbers between 1 and 4, not including the 4.

[1 2 3]


Exercise: create a 1D array of the numbers 1 through 10 using `np.arange()`. Then create a 2D array (3x3) with random integers between 0 and 100 using `np.random.randint()`.

In [13]:
# Your code here
array_1d = np.arange(1, 11)
array_2d = np.random.randint(0, 100, (3, 3))
array_1d, array_2d

(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
 array([[48, 61, 89],
        [ 3, 30, 55],
        [79,  7, 83]]))

## 3. Create a Pandas DataFrame
Use a dictionary to create a DataFrame with the following columns:
- `id`: a list of integers from 1 to 5
- `name`: a list of any 5 names
- `report_date`: a list of 5 different date strings (e.g., '2023-01-01')
- `random_person_id`: a list of 5 random integers between 1 and 100, where no two are the same. Hint: use `np.random.choice()` to select 5 choices from a list of numbers from 1 to 100, which you have now learned to create.

In [14]:
# Your code here
data = {
    'id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Carla', 'David', 'Eli'],
    'report_date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-05', '2023-01-08'],
    'random_person_id': np.random.choice(np.arange(1,100),5)
}
df = pd.DataFrame(data)
df

Unnamed: 0,id,name,report_date,random_person_id
0,1,Alice,2023-01-01,22
1,2,Bob,2023-01-02,81
2,3,Carla,2023-01-03,52
3,4,David,2023-01-05,97
4,5,Eli,2023-01-08,42


## 4. Convert to Datetime
Convert the `report_date` column to `datetime` format using `pd.to_datetime()`.

In [15]:
df['report_date'] = pd.to_datetime(df['report_date'])
df.dtypes

id                           int64
name                        object
report_date         datetime64[ns]
random_person_id             int64
dtype: object

## 5. Filtering
Filter the DataFrame to show only rows where `report_date` is after '2023-01-03'.

In [16]:
df[df['report_date'] > '2023-01-03']

Unnamed: 0,id,name,report_date,random_person_id
3,4,David,2023-01-05,97
4,5,Eli,2023-01-08,42


Filter the DataFrame by either name or random_person_id to only view the data for one person

In [17]:
df[df['name'] == 'David']

Unnamed: 0,id,name,report_date,random_person_id
3,4,David,2023-01-05,97


In [21]:
df[df['random_person_id'] == 22] # Make sure to use the double equals sign (==) here, as you're comparing values!

Unnamed: 0,id,name,report_date,random_person_id
0,1,Alice,2023-01-01,22
