<a href="https://colab.research.google.com/github/sensei-jirving/Online-DS-PT-01.24.22-cohort-notes/blob/main/Week_01/Lecture_02/solution/Intro_to_Python_Packages_Numpy_Instructor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Python Packages

> [Class Notebook](https://colab.research.google.com/drive/1Ilze7Y90HQeGbsWbxPEQm2a0gaCylgtM?usp=sharing)

- Python comes with many [Built-In Functions](https://docs.python.org/3/library/functions.html)
    - e.g. print, type, range, etc
- But **most of the functionality we need as data scientists is not included** in base Python.


- We can **download other collections of functions and classes, called Packages** (A.K.A. Libraries A.K.A Modules)
    - Python has a Package Index (PyPi) that is basically like an app-store for Python. 

    - In a code cell, we can install any PyPi packages we need using:
        - `!pip <package name>`


- **Packages You will Be Using in Stack 1:**
    - [Numpy](https://numpy.org/doc/stable/index.html#)
    - [Pandas](https://pandas.pydata.org/docs/)
    - [Matplotlib](https://matplotlib.org/)
    - [Seaborn](https://seaborn.pydata.org/)




- Thankfully, Google Colab has most of the packages we need already installed!


## Importing Packages & Modules

- When we import a package we can either just import it under its full name. 
```python
import numpy
```

- We can also give it an alias/handle (a short nick-name)
```python
import numpy as np
```

In [None]:
# import numpy
import numpy
numpy

<module 'numpy' from '/usr/local/lib/python3.7/dist-packages/numpy/__init__.py'>

In [None]:
## import numpy with an alias
import numpy as np
np

<module 'numpy' from '/usr/local/lib/python3.7/dist-packages/numpy/__init__.py'>

### Submodules

- Packages can be made of smaller pieces called submodules. 
    - Submodules allow functions to be organized in a helpful way.
    - Numpy has a submodule called `np.random` that contains functions related to generating or selecting data based on random chance.  


In [None]:
## show the module
np.random

<module 'numpy.random' from '/usr/local/lib/python3.7/dist-packages/numpy/random/__init__.py'>

In [None]:
## Can't choose a dinner option? Let numpy do it!
dinner_options = ['Cheeseburger','Chicken Tikka Masala','Lasagna', "Filet Mignon"]
np.random.choice(dinner_options)

'Lasagna'

# Why NumPy?

- Python lists and tuples are not efficient with large amounts of data. 
- Linear Algebra has a lot of helpful mathmatical manipulations we can use.
- We need a way to store our data in an organized linear fashion.
>- The solution: numpy arrays!

## Working with NumPy Arrays


- Make a `calories_per_serving` array with the calories per serving:

|                      |   Calories Per Serving |
|:---------------------|-----------------------:|
| Cheeseburger         |                    740 |
| Chicken Tikka Masala |                    240 |
| Lasagna              |                    408 |
| Filet Mignon         |                    301 |



In [None]:
## Make a color how many calories are in each>? from www.calorieking.com
calories_per_serving = np.array([740,240,408,301])
calories_per_serving

array([740, 240, 408, 301])


- Make a `prices` array with ther prices:

|                      |   Price |
|:---------------------|--------:|
| Cheeseburger         |    8.5  |
| Chicken Tikka Masala |   12.5  |
| Lasagna              |   11    |
| Filet Mignon         |   15.75 |


    

In [None]:
# what is the price? https://www.numbeo.com/food-prices/
prices = np.array([8.50, 12.50, 11.00,15.75])
prices

array([ 8.5 , 12.5 , 11.  , 15.75])

### What would our total calories be if we ate:

- 2 servings of Lasagna, 1 filet mignon, and 3 cheesburgers?

>Order total = the sum of all prices * number of servings ordered.
- Hint: Make a `servings` array.

In [None]:
# 2 servings of Lasagna and a 1 filet mignon, and 3 cheesburgers?
servings = np.array([3,0,2,1])
servings

array([3, 0, 2, 1])

In [None]:
## Calcualte total caloreis
np.sum(calories_per_serving * servings)

3337

#### What would our total bill be?

In [None]:
## calculate the total bill
np.sum(servings * prices)

63.25

### What if we decided to add 2 orders of Tikka Masala?

- Hmmm...what index was Tikka Masala?  🤔

## 💡 How to remind ourselves the names/integer index of each item

- Make an `options_array` of the names of the dinner options:
    - 'Cheeseburger', 'Chicken Tikka Masala','Lasagna', "Filet Mignon"



In [None]:
## arrays can store strings
options_array = np.array(['Cheeseburger','Chicken Tikka Masala','Lasagna', "Filet Mignon"])
options_array

array(['Cheeseburger', 'Chicken Tikka Masala', 'Lasagna', 'Filet Mignon'],
      dtype='<U20')

#### Using Enumerate 

- We can use the `enumerate` function to slice out each dinner option with its integer index.


In [None]:
## I can't remember what index is what! 
for i,food in enumerate(options_array):
    print(f"{i}: {food}")

0: Cheeseburger
1: Chicken Tikka Masala
2: Lasagna
3: Filet Mignon


- We will want to re-use this so we can wrap it into a simple function!

In [None]:
def index_report():
    """Uses enumerate to print the idex for each item in the array"""
    for i,food in enumerate(options_array):
        print(f"{i}: {food}")

### What if we decided to add 2 orders of Tikka Masala?

In [None]:
index_report()

0: Cheeseburger
1: Chicken Tikka Masala
2: Lasagna
3: Filet Mignon


In [None]:
## use the index to replace the value for chicken tikka msas with 2
servings[1] = 2
servings

array([3, 2, 2, 1])

In [None]:
# calculate total bill
(servings*prices).sum()

88.25

### What if there were discounted happy hour promotions?
- Cheesburgers and Filet Mignon are both 25% off
> Hint: make a `discounts` array.


In [None]:
index_report()

0: Cheeseburger
1: Chicken Tikka Masala
2: Lasagna
3: Filet Mignon


In [None]:
## discounts
discounts = np.array([.25,0,.25,0])

In [None]:
## discounted prices
discounted_prices = prices - prices*discounts
discounted_prices

array([ 6.375, 12.5  ,  8.25 , 15.75 ])

In [None]:
np.sum(discounted_prices * servings)

76.375

## Wouldn't it be nice...
>-  if we had a way to group ALL of this infromation wihtout memorizing indices that was really easiy to visualize?
- Hmmm....🤔 - a dictionary might work!

- Make a dinner_data dictionary that contains the data from:
    - prices
    - calories_per_serving
    - discounts
    - and servings

In [None]:
# We could use a dictionry for Price, Calories per serving, discount, servings
dinner_data = {"Dinner Option":options_array,
               'Price':prices,
                'Calories Per Serving':calories_per_serving,
                "Discount":discounts,
                'Servings':servings}
dinner_data

{'Calories Per Serving': array([740, 240, 408, 301]),
 'Dinner Option': array(['Cheeseburger', 'Chicken Tikka Masala', 'Lasagna', 'Filet Mignon'],
       dtype='<U20'),
 'Discount': array([0.25, 0.  , 0.25, 0.  ]),
 'Price': array([ 8.5 , 12.5 , 11.  , 15.75]),
 'Servings': array([3, 2, 2, 1])}

- Hmmm, thats **better** but its still really hard see the data aligned.

> 🐼 PANDAS TO THE RESCUE!

In [None]:
import pandas as pd

In [None]:
## make a dataframe from our dinner_data
df = pd.DataFrame(dinner_data)
df

Unnamed: 0,Dinner Option,Price,Calories Per Serving,Discount,Servings
0,Cheeseburger,8.5,740,0.25,3
1,Chicken Tikka Masala,12.5,240,0.0,2
2,Lasagna,11.0,408,0.25,2
3,Filet Mignon,15.75,301,0.0,1


In [None]:
## calculate the order total using the dataframe 
np.sum((df['Price'] - df['Price']*df["Discount"]) * df['Servings'])

76.375

### Pandas is Built On Top of Numpy

> Pandas is built ON TOP of NumPy and **therefore can do many of the same things as numpy arrays!**

In [None]:
## you can get the data as an array using .values
df.values

array([['Cheeseburger', 8.5, 740, 0.25, 3],
       ['Chicken Tikka Masala', 12.5, 240, 0.0, 2],
       ['Lasagna', 11.0, 408, 0.25, 2],
       ['Filet Mignon', 15.75, 301, 0.0, 1]], dtype=object)

In [None]:
## what is the average price of our foods?
df['Price'].mean()

11.9375

In [None]:
## how many servings did we order in total?
df['Servings'].sum()

8

> We will talk MUCH more about Pandas and DataFrames next week!