# Python Libaries
Numpy and Pandas are the two most widely used libraries. 

## Import - Specify what library to use
```
import numpy as np
import pandas as pd
```
To import only part of a library (e.g. to save memory)
```
from numpy import mean
...
mean(1, 2, 3)
```

## Numpy
A library of math or numeric processing.
* It is import to note that some operations are in-place, meaning the operation mutates the original object or value. 
* Some other operations are not in-place, they return the result as a new object instead. 

In [3]:
import numpy as np

In [2]:
numbers = [1, 2, 3, 4, 5]
type(numbers)

list

In [3]:
print ("mean: ", np.mean(numbers))
print ("average: ", np.average(numbers))

mean:  3.0
average:  3.0


### numpy has its own array type: numpy.ndarray

In [4]:
a=np.linspace(1, 7, 5)
print ("liner space : ", a )
print ("type of a: ", type(a))

liner space :  [1.  2.5 4.  5.5 7. ]
type of a:  <class 'numpy.ndarray'>


### convert from list to numpy.ndarray

In [5]:
nums_np = np.array(numbers)
print("type(numbers): ", type(numbers))
print("type(nums_np): ", type(nums_np))

nums_list = nums_np.tolist()
print("type(nums_list): ", type(nums_list))

type(numbers):  <class 'list'>
type(nums_np):  <class 'numpy.ndarray'>
type(nums_list):  <class 'list'>


## Pandas
pandas is a library that gives access to data sets.

In [5]:
import pandas as pd

### DataFrame
A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. A dataframe has the following parts: 
* Each _row_ represents a record
* Each _Column_ represents a variable or _feature_ in the dataset. Columns have data types and names.
* _Index_ is used to uniquely identify a row in the table.
* _Data_ or _Cell_ holds the actual data values

In [None]:
# Creating a dataframe from an arry - each element is an arrays of 3-elements
dict = np.random.randn(4, 3)
print ("Data dictionary 'dict':")
print (dict)
df = pd.DataFrame(data = dict)
print ("Dataframe df: ")
print (df)

#### Constructing DataFrame
___Important:___ Each column of a DataFrame can be arrays, but it must be one-dimensional. 

Typical python error about this limitation: `"ValueError: Per-column arrays must each be 1-dimensional"`

In [None]:
# create data set
var1 = np.random.randn(3,2) * 5 
var2 = np.random.randn(3,2) + 20 

# variable labels
labels = ['Temp', 'Ice cream']

# Compose a dictionary
dict = {labels[0] : var1, labels[1] : var2}
print ("dict: ", dict)

# use pandas data frame
df = pd.DataFrame(data=dict)
print("dataframe: ", df)

# pandas will try to fit 'dict' in a DataFrame by: 
#  - placing data of key 'Temp' in the first column
#  - placing data of key 'Ice cream' in to the second column
# However, both "Temp"  and "Ice cream" are linked to an 2-dimensional array - this violates the requirement of DataFrame construction.  


dict:  {'Temp': array([[1.10482479, 3.31318828],
       [1.79081594, 8.83284042],
       [2.11512438, 1.75888225]]), 'Ice cream': array([[21.4244878 , 19.19705313],
       [19.58061568, 19.12207088],
       [20.45794567, 18.57702693]])}


ValueError: Per-column arrays must each be 1-dimensional

#### Exercise
Create a pandas dataframe with: 1) integers from 0 to 10, 2) their square, and 3) their log  

In [None]:
num = np.array(range(0,11))
sqr = np.square(num)
log = np.log(num)
dataframe = pd.DataFrame({
    "num": num,
    "square": sqr,
    "log": log})
dataframe

# We'll get a WARNING of "divided by zero", this is caused by the "log(0)"" compute. 
# But the entire execution will still succeed. 

  log = np.log(num)


Unnamed: 0,num,square,log
0,0,0,-inf
1,1,1,0.0
2,2,4,0.693147
3,3,9,1.098612
4,4,16,1.386294
5,5,25,1.609438
6,6,36,1.791759
7,7,49,1.94591
8,8,64,2.079442
9,9,81,2.197225
