<strong>NumPy</strong>
-- Foundational data structure, in Python, and powerful tool in which other powerful tools are built upon, such as SciPy, Matplotlib, Pandas, Scikit-Learn and more.
* Low level data structure(np.array)
* Large multidimensional arrays and matrices
* Wide range of mathematical operations can be performed on data structures
* `import numpy as np`

<strong>Pandas</strong>
-- Python library which provides high performance easy to use data structures and data analysis tools. Runs on top of NumPy (NumPy is a dependency of Pandas), so if you type `conda install pandas` in your terminal NumPy will also install automatically. Popular for data science, financial modeling, statistics, etc. 
* High level data structure (dataframes)
* More suited for dealing with `tabular` data (spreadsheets)
* Data alignment, fills in missing data, makes date friendlier to work with, etc.
* `import pandas as pd`

<strong>Comblined</strong>
* Use NumPy's calculation capabilities with Pandas' data structuring models to yield powerful and visual results

### Import packages

In [2]:
import numpy as np
import pandas as pd

### Create dataframe

In [3]:
user_dataframe = [
    {
        'id': 1,
        'first_name': 'Derek',
        'last_name': 'Hawkins',
        'email': 'derekh@codingtemple.com'
    },
    {
        'id': 2,
        'first_name': 'Lucas',
        'last_name': 'Lang',
        'email': 'lucasl@codingtemple.com'
    }
    
]

for i in user_dataframe:
    print(i['first_name'])

Derek
Lucas


In [4]:
### [
###     [1,2,3,4,5],
###     [6,7,8,9,10]
### ]

### Show data types,  indexes, columns, values

<p>
    <i>32-bit integer (int32) = 2,147,483,647</i>
</p>
<p>
    <i>64-bit integer (int64) = 9,223,372,036,854,775,807</i>
</p>

In [9]:
data = [
        ['Student A', 92, 88, 65, 99, 50],
        ['Student B', 100, 100, 100, 100, 100],
        ['Student C', 95, 88, None, 100, 99],
        ['Student D', 88, 90, 89, 100, 70],
        ['Student E', 100, 100, 100, 100, 100],
        ['Student F', 90, 45, 77, 98, 99],
        ['Student G', 70, 60, 60, 12, 65],
        ['Student H', 99, 99, 100, 100, 100],
        ['Student I', 100, 100, 100, 100, 100],
        ['Student J', 80, 88, 95, 77, 100],
    ]
# np.int32

In [15]:
df = pd.DataFrame(data=data, index=list(range(1, len(data)+1))), columns=['Name', 'Quiz 1', 'Quiz 2', 'Quiz 3', 'Quiz 4', 'Quiz 5']

df

SyntaxError: cannot assign to function call (<ipython-input-15-de1ed16c94e7>, line 1)

### Statistical summary of data

In [None]:
df.describe()

### Basic informatino about dataframe

In [8]:
df.info()

NameError: name 'df' is not defined

In [7]:
# Work with float numbers and display int numbers when neccessary
# float('1.0')
# int('1')

# 1.0 * 6

1

### Sort all values by certain criteria

In [None]:
df.sort_values('Name', ascending=False)

In [None]:
df.sort_values('Quiz 4', ascending=False)

In [None]:
df.transpose
df

### Slicing data

In [None]:
# 
# object literal notation
df['Quiz 1']

# object literal notation
df.Quiz 1

### Filtering

In [14]:
#  regular way
# sql query
df[df['Quiz_1'] >90]

NameError: name 'df' is not defined

In [None]:
# .query method
df.query('Quiz_1 >90')

In [None]:
df.query('Name == "Student A"')

In [None]:
# .isin() = selecting which student to show records for
df[df.Name.isin(['Student I', 'Student B' 'Student '])]

In [None]:
df.info()

### Assignment

In [None]:
# Teacher curved everyone's scores to 100
df.loc[:, 'Quiz_1'] = 100

In [None]:
df

In [None]:
df['Averages'] = (df.Quiz_1 + df.['Quiz_2']+ df.['Quiz_3']+ df.['Quiz_4']+ )

In [None]:
df

In [None]:
for i in df.Quiz

In [None]:
df['Quiz 2']

### Rename columns

In [None]:
df.rename(columns={'Averages': 'Average'})
df

In [None]:
# df = rename(columns={'Averages': 'Average'})
# df

In [None]:
df.rename(columns={'Averages': 'Average'}, inplace=True)
df

In [None]:
df.rename(lambda c: c.lower().replace(' ', '_'), axis=1)

### Interate over dataframe

In [None]:
for idx, row, in df.iterrows():
    print(idx, row['name'], row['average'])

### Save to CSV file

In [None]:
# store multiple versions
# prevents losing data via cleaning data
df.to_csv('stuff_v1.csv')

### Load data from CSV file into Jupyter Notebook as a Pandas dataframe

In [None]:
new_df = pd.read_csv('stuff_v1.csv')
new_df