<a href="https://colab.research.google.com/github/sensei-jirving/Online-DS-PT-01.24.22-cohort-notes/blob/main/Week_01/Lecture_02/Intro_to_Python_Packages_Numpy_Class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Python Packages

- Python comes with many [Built-In Functions](https://docs.python.org/3/library/functions.html)
    - e.g. print, type, range, etc
- But **most of the functionality we need as data scientists is not included** in base Python.


- We can **download other collections of functions and classes, called Packages** (A.K.A. Libraries A.K.A Modules)
    - Python has a Package Index (PyPi) that is basically like an app-store for Python. 

    - In a code cell, we can install any PyPi packages we need using:
        - `!pip <package name>`


- **Packages You will Be Using in Stack 1:**
    - [Numpy](https://numpy.org/doc/stable/index.html#)
    - [Pandas](https://pandas.pydata.org/docs/)
    - [Matplotlib](https://matplotlib.org/)
    - [Seaborn](https://seaborn.pydata.org/)




- Thankfully, Google Colab has most of the packages we need already installed!


## Importing Packages & Modules

- When we import a package we can either just import it under its full name. 
```python
import numpy
```

- We can also give it an alias/handle (a short nick-name)
```python
import numpy as np
```

In [None]:
# import numpy
import numpy
numpy

<module 'numpy' from '/usr/local/lib/python3.7/dist-packages/numpy/__init__.py'>

In [None]:
## import numpy with an alias
import numpy as np
np

<module 'numpy' from '/usr/local/lib/python3.7/dist-packages/numpy/__init__.py'>

In [None]:
## functions and classes stored in the package are reference with .indexing
np.array

<function numpy.array>

In [None]:
# !pip install missingno

### Submodules

- Packages can be made of smaller pieces called submodules. 
    - Submodules allow functions to be organized in a helpful way.
    - Numpy has a submodule called `np.random` that contains functions related to generating or selecting data based on random chance.  


In [None]:
## show the np.random module
# np.random.

<module 'numpy.random' from '/usr/local/lib/python3.7/dist-packages/numpy/random/__init__.py'>

In [None]:
## Can't choose a dinner option? Let numpy do it!
np.random.choice(['Cheeseburger','Chicken Tikka Masala','Lasagna', "Filet Mignon"])

'Filet Mignon'

# Why NumPy?

- Python lists and tuples are not efficient with large amounts of data. 
- Linear Algebra has a lot of helpful mathmatical manipulations we can use.
- We need a way to store our data in an organized linear fashion.
>- The solution: numpy arrays!



## Working with NumPy Arrays


- Make a `calories_per_serving` array with the calories per serving:

|                      |   Calories Per Serving |
|:---------------------|-----------------------:|
| Cheeseburger         |                    740 |
| Chicken Tikka Masala |                    240 |
| Lasagna              |                    408 |
| Filet Mignon         |                    301 |



In [None]:
## Make a color how many calories are in each>? from www.calorieking.com
calories_per_serving = np.array([740,240,408,301])
calories_per_serving

array([740, 240, 408, 301])


- Make a `prices` array with ther prices:

|                      |   Price |
|:---------------------|--------:|
| Cheeseburger         |    8.5  |
| Chicken Tikka Masala |   12.5  |
| Lasagna              |   11    |
| Filet Mignon         |   15.75 |


    

In [None]:
# what is the price? https://www.numbeo.com/food-prices/
prices = np.array([8.5,12.50,11, 15.75])
prices

array([ 8.5 , 12.5 , 11.  , 15.75])

### 💡 How to remind ourselves the names/integer index of each item

- Make an `options_array` of the names of the dinner options:
    - 'Cheeseburger', 'Chicken Tikka Masala','Lasagna', "Filet Mignon"



In [None]:
## arrays can store strings
options_array = np.array(['Cheeseburger','Chicken Tikka Masala','Lasagna', "Filet Mignon"])
options_array

array(['Cheeseburger', 'Chicken Tikka Masala', 'Lasagna', 'Filet Mignon'],
      dtype='<U20')

##### Using Enumerate 

- We can use the `enumerate` function to slice out each dinner option with its integer index.


In [None]:
## I can't remember what index is what! 
# help me, enumerate!
for i,option in enumerate(options_array):
    print(f"{i}: {option}")
    # print("{}: {}".format(i, option))

0: Cheeseburger
1: Chicken Tikka Masala
2: Lasagna
3: Filet Mignon


- We will want to re-use this so we can wrap it into a simple function!

In [None]:
## make the index_report function

def index_report():
    for i,option in enumerate(options_array):
        print(f"{i}: {option}")

### Q1: What would our total calories be if we ate:

- 2 servings of Lasagna, 1 filet mignon, and 3 cheesburgers?

>Order total = the sum of all calories * number of servings ordered.
- Hint: Make a `servings` array.

In [None]:
index_report()

0: Cheeseburger
1: Chicken Tikka Masala
2: Lasagna
3: Filet Mignon


In [None]:
# 2 servings of Lasagna and a 1 filet mignon, and 3 cheesburgers?
servings = np.array([3,0,2,1])
servings

array([3, 0, 2, 1])

In [None]:
## Calcualte total caloreis
(servings * calories_per_serving).sum()

3337

### Q2: What would our total bill be?

In [None]:
## calculate the total bill
(prices * servings).sum()

63.25

### Q3:  What if we decided to add 2 orders of Tikka Masala?

- Hmmm...what index was Tikka Masala?  🤔

### Q3 Continued: What if we decided to add 2 orders of Tikka Masala?

In [None]:
# run our function
index_report()

0: Cheeseburger
1: Chicken Tikka Masala
2: Lasagna
3: Filet Mignon


In [None]:
## use the index to replace the value for chicken tikka msas with 2
servings[1] = 2
servings

array([3, 2, 2, 1])

In [None]:
# calculate total bill
(prices * servings).sum()

88.25

### Q4: What if there were discounted happy hour promotions?
- Cheesburgers and Filet Mignon are both 25% off
> Hint: make a `discounts` array.


In [None]:
prices

array([ 8.5 , 12.5 , 11.  , 15.75])

In [None]:
## discounts array
discounts = np.array([.25,0,.25,0])
discounts

array([0.25, 0.  , 0.25, 0.  ])

In [None]:
## discounted prices
discounted_prices = prices -discounts*prices
discounted_prices

array([ 6.375, 12.5  ,  8.25 , 15.75 ])

In [None]:
## calculate the total prices with the discounts
(discounted_prices * servings).sum()

76.375

## Wouldn't it be nice...
>-  if we had a way to group ALL of this infromation wihtout memorizing indices that was really easiy to visualize?
- Hmmm....🤔 - a dictionary might work!

- Make a dinner_data dictionary that contains the data from:
    - prices
    - calories_per_serving
    - discounts
    - and servings

In [None]:
# We could use a dictionry for Price, Calories per serving, discount, servings
dinner_data = {"Dish":options_array,
    "prices":prices,
               'discounts':discounts,
               'calories':calories_per_serving,
               'servings':servings,
               }
dinner_data


{'Dish': array(['Cheeseburger', 'Chicken Tikka Masala', 'Lasagna', 'Filet Mignon'],
       dtype='<U20'),
 'calories': array([740, 240, 408, 301]),
 'discounts': array([0.25, 0.  , 0.25, 0.  ]),
 'prices': array([ 8.5 , 12.5 , 11.  , 15.75]),
 'servings': array([3, 2, 2, 1])}

- Hmmm, thats **better** but its still really hard see the data aligned.

> 🐼 PANDAS TO THE RESCUE!

In [None]:
import pandas as pd

In [None]:
## make a dataframe from our dinner_data
df =pd.DataFrame(dinner_data)
df = df.set_index('Dish')
df

Unnamed: 0_level_0,prices,discounts,calories,servings
Dish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cheeseburger,8.5,0.25,740,3
Chicken Tikka Masala,12.5,0.0,240,2
Lasagna,11.0,0.25,408,2
Filet Mignon,15.75,0.0,301,1


In [None]:

df.index

Index(['Cheeseburger', 'Chicken Tikka Masala', 'Lasagna', 'Filet Mignon'], dtype='object', name='Dish')

In [None]:
## loc select row, column
df.loc['Cheeseburger','prices']

8.5

In [None]:
## just slicing col
df['prices']

Dish
Cheeseburger             8.50
Chicken Tikka Masala    12.50
Lasagna                 11.00
Filet Mignon            15.75
Name: prices, dtype: float64

In [None]:
## calculate the order total using the dataframe 
df

Unnamed: 0_level_0,prices,discounts,calories,servings
Dish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cheeseburger,8.5,0.25,740,3
Chicken Tikka Masala,12.5,0.0,240,2
Lasagna,11.0,0.25,408,2
Filet Mignon,15.75,0.0,301,1


In [None]:
df['discounted prices'] =df['prices'] - df['prices']* df['discounts']
df

Unnamed: 0_level_0,prices,discounts,calories,servings,discounted prices
Dish,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Cheeseburger,8.5,0.25,740,3,6.375
Chicken Tikka Masala,12.5,0.0,240,2,12.5
Lasagna,11.0,0.25,408,2,8.25
Filet Mignon,15.75,0.0,301,1,15.75


In [None]:
(df['discounted prices'] * df['servings']).sum()

76.375

### Pandas is Built On Top of Numpy

> Pandas is built ON TOP of NumPy and **therefore can do many of the same things as numpy arrays!**

In [None]:
## you can get the data as an array using .values
df.values

array([[8.500e+00, 2.500e-01, 7.400e+02, 3.000e+00],
       [1.250e+01, 0.000e+00, 2.400e+02, 2.000e+00],
       [1.100e+01, 2.500e-01, 4.080e+02, 2.000e+00],
       [1.575e+01, 0.000e+00, 3.010e+02, 1.000e+00]])

In [None]:
## what is the average price of our foods?
df['prices'].sum()

47.75

In [None]:
## how many servings did we order in total?


> We will talk MUCH more about Pandas and DataFrames next week!