# Intro to Data Science @ SzISz

## Table of contents

- <a href="#Administration">administration</a>
- <a href="#Intro">intro</a>
- <a href="#Numpy">numpy</a>
- <a href="#Scipy">scipy</a>
- <a href="#Mind-blasting-stuff">mindblast</a>

## Administration

### Curriculum:
- Data Mining basics; numpy-scipy vectors and matrices
- Data Discovery; pandas-matplotlib-seaborn
- Data Transformation; pandas-sklearn
- Dimensionality Reduction; sklearn
- Classification, Regression; sklearn
- Clustering; sklearn
- Validation; sklearn
- Text Mining; sklearn-textblob-gensim
- Deep Learning; gensim-tensorflow
- Kaggle projects; all-in!

### Requirements:

A selected project submitted to one of 
<a href="https://www.kaggle.com/competitions">kaggle.com</a>'s competitions.

## Intro

### WTF is Data Science?

According to a random venn diagram:

<img src="http://b-i.forbesimg.com/gilpress/files/2013/05/Data_Science_VD.png" width=300 align="left">

As a metro map: 

<a href="http://nirvacana.com/thoughts/wp-content/uploads/2013/07/RoadToDataScientist1.png" target="new">
    <img src="http://nirvacana.com/thoughts/wp-content/uploads/2013/07/RoadToDataScientist1.png" width=500 align="left">
</a>

### At the end of the day:

It's just a fancier name for Data Mining. Maybe throw some more hacking skill to the mix.


### Who is a Data Scientist then?

- "A data scientist is a statistician who lives in San Francisco"
- "A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician."


### Thanks, much clearer now. (NOT) Can you please tell me at least what does he do? 
#### A.k.a: the typical workflow - The Knowledge Discovery Process

<img src="http://www.cs.utexas.edu/users/csed/doc_consortium/DC99/wooley-image1.gif">

## Numpy

### 1. What is numpy?

<img src="http://orig14.deviantart.net/b39a/f/2009/244/6/1/ren_from_ren_and_stimpy_by_dragon_queen01456.jpg" width=100 align="left">

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

In [None]:
import numpy as np

Basic numpy object

In [None]:
np.array([0, 1, 2, 3])

How much faster is it than a regular python list?

In [None]:
L = range(1000)

In [None]:
%timeit [i**2 for i in L]

In [None]:
a = np.arange(1000)

In [None]:
%timeit a**2

##### To fasten up the process, let's use <a href="http://www.scipy-lectures.org/intro/numpy/array_object.html#creating-arrays">this</a> tutorial instead.

In [None]:
# your tutorial code comes here...

### Basic linalg operations

In [None]:
A = np.array([np.arange(1,5)]) # make it "2D"
B = np.array([np.arange(1,5)]).T
# alternatively:
# B = np.arange(2)[:, np.newaxis]

In [None]:
A*B

In [None]:
A.dot(B)

---

In [None]:
A = np.arange(20).reshape((4, 5))
A

In [None]:
A.flatten()

In [None]:
B = np.arange(1, 11).reshape((2, 5))
B.shape

In [None]:
B.T.shape

In [None]:
A*2

In [None]:
A*2.0

In [None]:
A*B.T

In [None]:
A.dot(B.T)

## Scipy
<img src="http://vignette3.wikia.nocookie.net/renandstimpy/images/7/76/220px-Stimpy.jpg" width=100 align="left">

In [None]:
import scipy.sparse as sp

In [None]:
sp.eye((10))

In [None]:
sp.eye((10)).todense()

##### To fasten up the process, let's use <a href="http://www.scipy-lectures.org/advanced/scipy_sparse/introduction.html">this</a> tutorial instead.

In [None]:
# your tutorial code comes here...

## Mind blasting stuff

<img src="http://vignette2.wikia.nocookie.net/nickelodeon/images/1/14/Ren%2B%2BStimpy.jpg" width=200 align=left>

### <a href="http://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.optimize.curve_fit.html">Let's fit a curve</a>

In [None]:
from scipy.optimize import curve_fit

In [None]:
def f(x, a, b, c):
    return a*x**2 + b*x + c

In [None]:
def df(x, a, b, c):
    return a*x + b + c

In [None]:
x = np.linspace(0, 50, 100)
y = f(x, 0.5, 1.5, 5.5)
y_noisy = y + 0.2 * np.random.normal(size=len(x))

In [None]:
params, cov = curve_fit(df, x, y_noisy)
params, cov

In [None]:
y_hat = df(x, *params)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
plt.plot(x, y, 'bo', x, y_hat, 'r-')

In [None]:
error = np.sum(np.abs(y-y_hat))
error

### <a href="http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/linalg.html#finding-inverse">Let's find the inverse of a matrix!</a>

In [None]:
# TODO

### <a href="http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/linalg.html#solving-linear-system">Let's solve a linear system!</a>

In [None]:
# TODO