# Introduction to Jupyter Notebook and Pandas

## What is a Jupyter Notebook?

A [Jupyter](https://jupyter.org/) notebook is a document that can contain live code w/ results, visualizations, and rich text. It is widely used in data science and analytics. The cell below is a *code* cell. It contains a block of executable code.

Run the code below by clicking on the cell below and clicking the "Run" button on top (▶).

In [1]:
print(10 + 20)

30


### Exercise 1

We will first write some Python code to become familiar with Jupyter notebooks.

Complete the code cell below to find the sum of all values in `my_list`. Store the result in a new variable named `result`.

In [2]:
my_list = [11, 20, 52, 91, 90, 75, 74, 20, 21, 10, 14]

### BEGIN SOLUTION
result = 0

for num in my_list:
    result = result + num
### END SOLUTION

print(result)

478


#### 🧭 Check Your Work

Once you're done, run the code cell below to test correctness.

- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix any incorrect parts.

In [3]:
# Exercise 1 Autograder
import unittest

tc = unittest.TestCase()

tc.assertEqual(result, 478)

## Introduction to Pandas

Pandas is a Python *library* for data manipulation and analysis. Although it's used universally in data-related programming applications, it was initially developed for financial analysis by [AQR Capital Management](https://www.aqr.com/).

Note: A *library* in the context of programming is a collection of functions (and other data) that others have already written for you.

Pandas is popular for many reasons:

1. 🏃🏿‍♀️ It's fast (for most cases where the dataset can be loaded to your memory).
2. 🪒 It supports most of the features required for data manipulation.
3. 💡 Write less code. Get more done.

### Creating a Pandas DataFrame

The first step in working with Pandas is to *import* the library. The code cell below contains code to import the `pandas` library.

Run the code cell below.

In [4]:
import pandas as pd

Once we load the library, we can start creating tabular data using Pandas. For the purpose of this tutorial, we'll create the following tabular dataset.

| product | quantity | unit_price |
|---:|---:|---:|
| N95 Face Mask | 10 | 2.50 |
| Hand Sanitizer | 2 | 4.50 |
| Alcohol Wipe | 20 | 3.75 |

### Exercise 2

Use the following code to create a `DataFrame` named `orders`.

```python
df_orders = pd.DataFrame({
    'product': ['N95 Face Mask', 'Hand Sanitizer', 'Alcohol Wipe'],
    'quantity': [10, 2, 20],
    'unit_price': [2.5, 4.5, 3.75]
})

display(df_orders)
```

In [5]:
# Type the code above in this cell
### BEGIN SOLUTION
df_orders = pd.DataFrame({
    'product': ['N95 Face Mask', 'Hand Sanitizer', 'Alcohol Wipe'],
    'quantity': [10, 2, 20],
    'unit_price': [2.5, 4.5, 3.75]
})

display(df_orders)
### END SOLUTION

Unnamed: 0,product,quantity,unit_price
0,N95 Face Mask,10,2.5
1,Hand Sanitizer,2,4.5
2,Alcohol Wipe,20,3.75


#### 🧭 Check Your Work

Once you're done, run the code cell below to test correctness.

- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix any incorrect parts.

In [6]:
# Exercise 2 Autograder
import unittest

df_orders_SOL = pd.DataFrame({'product': {0: 'N95 Face Mask', 1: 'Hand Sanitizer', 2: 'Alcohol Wipe'},
 'quantity': {0: 10, 1: 2, 2: 20},
 'unit_price': {0: 2.5, 1: 4.5, 2: 3.75}});

pd.testing.assert_frame_equal(
    df_orders,
    df_orders_SOL
)

### Print a concise summary of a `DataFrame`

👉 A common first step in working with a DataFrame is to use the `info()` function. `info()` prints a concise summary of a DataFrame.

Type and run `df_orders.info()` in the code cell below.

In [7]:
### BEGIN SOLUTION
df_orders.info()
### END SOLUTION

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   product     3 non-null      object 
 1   quantity    3 non-null      int64  
 2   unit_price  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes


The `info()` function can be useful when you work with a dataset with a large number of rows and columns. The output tells us the number of rows, columns, and the data type for each column. We won't worry too much about interpreting the result at the moment.

```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   product     3 non-null      object 
 1   quantity    3 non-null      int64  
 2   unit_price  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
```

### Exercise 3

Create a DataFrame named `df_orders2` with the following data. Remember, Python is case-sensitive.

| product | quantity | unit_price |
|---:|---:|---:|
| Packing Tape | 100 | 3.50 |
| Large Box | 500 | 1.25 |
| Bubble Wrap | 200 | 5.00 |

In [8]:
# Type the code above in this cell
### BEGIN SOLUTION
df_orders2 = pd.DataFrame({
    'product': ['Packing Tape', 'Large Box', 'Bubble Wrap'],
    'quantity': [100, 500, 200],
    'unit_price': [3.5, 1.25, 5]
})

display(df_orders2)
### END SOLUTION

Unnamed: 0,product,quantity,unit_price
0,Packing Tape,100,3.5
1,Large Box,500,1.25
2,Bubble Wrap,200,5.0


#### 🧭 Check Your Work

Once you're done, run the code cell below to test correctness.

- ✔️ If the code cell runs without an error, you're good to move on.
- ❌ If the code cell throws an error, go back and fix any incorrect parts.

In [9]:
# Exercise 3 Autograder
import unittest

df_orders2_SOL = pd.DataFrame({'product': {0: 'Packing Tape', 1: 'Large Box', 2: 'Bubble Wrap'},
 'quantity': {0: 100, 1: 500, 2: 200},
 'unit_price': {0: 3.5, 1: 1.25, 2: 5.0}});

pd.testing.assert_frame_equal(
    df_orders2,
    df_orders2_SOL
)

👉 This is the end of the tutorial. Go ahead and submit the notebook!