# Introduction to Numpy

* Overview
* ndarray
* Indexing and Slicing

More info: [http://wiki.scipy.org/Tentative_NumPy_Tutorial](http://wiki.scipy.org/Tentative_NumPy_Tutorial)

## Numpy Overview

* Why Python for Data? Numpy brings decades of C math into Python!
   * Numpy provides wrapper for extensive C/C++/Fortran codebases for data analysis and analytical functionality
* NDAarray allows easy vectorized math and broadcasting (i.e. functions on vector elements of different shapes)

In [1]:
import pandas as pd
import datetime
import pandas.io.data as web
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
import numpy as np

The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.


## Convert Dictionaries to DataFrame

In [None]:
GA_DAT = {'Class':[1,2,3,4,5],
             'Registered':[20,30,40,35,10],
             'Graduated':[19,28,35,20,10]}
df = pd.DataFrame(GA_DAT)   #Convert a dictionary to a dataframe
print(df)  # look up index do you like it? (I don't!)

## Let's change index to Class #

In [None]:
df = df.set_index('Class')
print(df)

In [None]:
df.head()


In [None]:
print(df.Graduated)
print(df['Graduated'])

In [None]:
print(df[['Graduated','Registered']])

In [None]:
df.describe()  #Summary of your data frame

In [None]:
df.describe()['Graduated']['25%']

# Now let's change DataFrames to Arrays

In [None]:
MyArray = np.array(df[['Graduated','Registered']])
print(MyArray)

## Grouping

In [None]:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                       'foo', 'bar', 'foo', 'foo'],
                       'B' : ['one', 'one', 'two', 'three',
                             'two', 'two', 'one', 'three'],
                       'C' : np.random.randn(8),
                       'D' : np.random.randn(8)})

df

In [None]:
df.groupby(['A','B']).mean()

# Now Let's Play with Web Data

In [None]:
start = datetime.datetime(2010,1,1)
end = datetime.datetime(2016,1,29)

df = web.DataReader("TSLA","yahoo",start,end)
print(df.head())
print(df.tail())


df["Adj Close"].max()



In [None]:
df['Close'].plot()
plt.show()

# Input/output almost any type of Data using Pandas
Using pandas I/O API you can read from, convert to, or write in almost any type of dataset. Please refer to <http://pandas.pydata.org/pandas-docs/stable/io.html> for more info.

## Let's read csv

You can read any file from your local computer by refering to the complete address of your data.

In [None]:
df1 = pd.read_csv('/Users/hamed/Desktop/SF-DAT-20/Data/SF-Median-Prices.csv') 
print(df1)

In [None]:
import os
from os import getcwd
getcwd()

In [None]:
os.chdir('/Users/hamed/Desktop/SF-DAT-20/Data/') #PLEASE SET THIS WORKING DIRECTORY TO YOURS


In [None]:
df2 = pd.read_csv('SF-Median-Prices.csv')
df2.head()

In [None]:
df2['Value'].plot()
plt.show()
#How do you like this graph? What is going wrong? 

In [None]:
df2 = df2.set_index('Date')


In [None]:
df2.head()

In [None]:
df2['Value'].plot()
plt.show() #the graph looks better - at least has correct labels

## Bonus Question: Can you fix the plot such that it shows ealier dates before later dates?

Now let's save our dataframe into "html" format

In [None]:
df2.to_html('SFData.html') #check your working directory

## Now let's rename columns in our DataFrame

In [None]:
df2 = df2.rename(columns = {'Value':'SFHomeValue'},)

In [None]:
df2.head()

In [None]:
df2.loc['12/31/15':'10/31/15',['SFHomeValue']]  #Access to a portion of DataFrame using index

In [None]:
df2.count()

In [None]:
df2.describe()

In [None]:
df2.sort(columns='SFHomeValue')

In [None]:
df3 = df2[df2.SFHomeValue > 1000000]


In [None]:
pd.isnull(df3)  #Looking for missing values - output is boolean - True if there is any missing value

# Next Steps

**Recommended Resources**

Name | Description
--- | ---
[Official Pandas Tutorials](http://pandas.pydata.org/pandas-docs/stable/tutorials.html) | Wes & Company's selection of tutorials and lectures
[Julia Evans Pandas Cookbook](https://github.com/jvns/pandas-cookbook) | Great resource with examples from weather, bikes and 311 calls
[Learn Pandas Tutorials](https://bitbucket.org/hrojas/learn-pandas) | A great series of Pandas tutorials from Dave Rojas
[Research Computing Python Data PYNBs](https://github.com/ResearchComputing/Meetup-Fall-2013/tree/master/python) | A super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas