[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/joshmaglione/CS102-Jupyter/main?labpath=.%2FWeek12.ipynb) 

<a href="https://colab.research.google.com/github/joshmaglione/CS102-Jupyter/blob/main/Week12.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> 

[View on GitHub](https://github.com/joshmaglione/CS102-Jupyter/blob/main/Week12.ipynb)

# Week 12: Wrap up

Goals: 
- Overview of neural networks
- Long-term perspective (~5 years)
- Short-term perspective (~1 month)

Although we will wrap up the semester, I want to continue discussing machine learning.

In particular, I want to give an overview of (deep) neural networks.

## Neural networks

You can build neural networks using many different Python packages.

- [PyTorch](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
- [Scikit-Learn](https://scikit-learn.org/stable/modules/neural_networks_supervised.html)
- [TensorFlow](https://www.tensorflow.org/guide/keras)

Neural networks are simply mathematical objects called [*graphs*](https://en.wikipedia.org/wiki/Graph_theory).

Nodes (or vertices) are called *neurons*.

![](https://tamaszilagyi.com/blog/2017/2017-11-11-animated_net_files/figure-html/fig2-1.png)

Neurons are grouped in layers, usually from left to right.

- The first layer is the *input layer*.
- The final layer is the *output layer*.
- The rest of the layers are *hidden layers*.

Each edge carries a weight, which is a real number. These comprise the *tunable parameters*.

Each neuron is a value and a mathematical function. 
- It sums together all incoming nodes (multiplied by the corresponding weights)
- and it applies the function
- The output is the corresponding value for the neuron.

The neural network "learns" by a significant number of trial-and-errors. 

Mathematical optimization functions are used to help tune the parameters in a smart way.

![](https://raw.githubusercontent.com/mtoto/mtoto.github.io/master/data/2017-11-08-net/result.gif)

The AI world is moving at break-neck speeds. 

There has been lots of news just during this semester! (And I do not actively follow...)

- [Google's AlphaGeometry](https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/) (Janunary 2024)
- [OpenAI's Sora announcement](https://openai.com/sora) (February 2024)
- [Google's Gemini (1.5)](https://deepmind.google/technologies/gemini/#introduction) (February 2024)

# Long-term: getting a job

You might have found everything I did in this lecture boring. 

(If this really is you, try some of the exercises. Is it still boring? If so, adjust course accordingly.)

What did you find interesting? 
- No need to tell me, but I *am* curious.

Start building a portfolio of work.

## How to get started? 

- There is a significant code-base that is open source and on GitHub.
- There are lots of developers, employers, and amateurs (= not being paid to code) on GitHub.

Get on GitHub, build a website for yourself, and put your work out there for people to see.

### Examples (not exhaustive!)

1. Find an open-source project you like and try to get invovled. 
	- Fork the repo, try to improve it, and create pull request. For example, you could fix a bug or add a feature. 

2. Build some code for a particular (and very specific) task. 
	- Create a (public) repo on GitHub, where users can download your code. There should be some documentation, so that users know what is going on.

3. Find an interesting data set and tell a story.
	- Learn the details of the data set. How was it collected? Take your data visualization skills and teach someone about your data. Please be responsible.
  
4. Build a neural network to do something.
	- Take an NN off the shelf or build one yourself from scratch. See if you can get it to do what you want.

Starting a portfolio right now (in year 1) is a great way to 
- figure out what you actually enjoy doing
- build a large body of work
- prepare you for future modules
- guide your decisions for short- to mid-term goals

### But I don't know anything!

OK, you don't have to do anything I suggest. 

**Anecdote:** Don't telle anyone, but I've never taken a computer science module in my life. 
- My research is in computational mathematics and I develop polynomial-time algorithms for specific problems concerning tensors and algebras. 

# Short-term: preparing for the exam

Long-term goals are great to think about, but we also need to be **pragmatic**!

## Jupyter/Python Basics
- Importing Packages
- Ipython/Jupyter line/cell magic e.g. `%timeit`
- Help syntax
- Input/Output history

## Data Types and Numpy
- Static vs Dynamic typing
- Integer Type in Python vs Numpy
- Python List vs Numpy List
- Differences in Memory and Efficiency Python vs Numpy

## Numpy Syntax & Usage
- Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). 
- Array creation
	- `.zeros` and `.ones` and `.eye`
	- `.full` and `.empty`
	- `.arange([start, ]stop, [step, ])` and `.linspace(start, stop, num)`
	- `.random*` 
- Numpy Data Types {int*, uint*, float*, complex*}
- Comparison of Numpy/Python arrays
- Array Attributes 
	- `.dtype`
	- `.shape`
	- `.itemsize`
	- `.nbytes`
- Indexing Numpy Arrays
- Slicing Numpy Arrays `[start:stop:step]`
- Array copying & Reshaping 
	- `.newaxis`
	- `.split`
- Ufuncs
	- Meaning & advantage
	- Binary & Unary
	- arithmetic/trig/exp/log
- Aggregates
	- `.reduce`
	- `.accumulate`
	- `.sum` and `.prod`
- Broadcasting Rules
- Boolean Arrays
	- via comparisons
	- counting `.sum` and `.any` and `.all`
	- Masks & use via fancy indexing
	- Boolean Operators AND `&` and OR `|` and NOT `~` and XOR `^`
- Array `.sort` and `.argsort`

## Pandas Syntax & Usage
- Objects
	- `.Series`
	- `.DataFrame`
	- (`.Index`)
- Series object properties
	- `.values`
	- `.index`
- Indexing without `.loc` and `.iloc`
- Indexing with `.loc` and `.iloc`
- DataFrame object properties
	- `.value_counts`
	- `.index`
	- `.columns`
- Sub-dataframes using Masking, fancy indexing, and `.query`
- Missing values `Nan` and use of `.dropna`
- DataFrame as
	- list of Series
	- Dictionary of Series
	- Array 
- Index as an immutable ordered set	
- Mathematical operations on DataFrames
	- Index preservation
	- Ufuncs or Pandas methods
- Join Operations on DataFrames
	- `.concat` and `.merge`
	- merge types 
		- 1-to-1, 
		- many-to-1,
		- many-to-many
	- `how=`
		- inner (Intersection of Indices)
		- outer (Union of Indices)
- Using `.dropna` `.isnull` `.any`
- `DataFrame.plot` (By default, matplotlib is used)

## Matplotlib Syntax & Usage
- `plt.plot(x,f(x))`
- `plt.scatter(x,y)`
- `plt.hist(data)`
- Subplots with `plt.subplots(i, j)`

## Machine Learning
- What is Machine Learning?
- What are the two types of of ML (with examples)?
- Supervised Learning
	- Classification (e.g. using DecisionTreeClassifier or a Random Forest)
	- Regression (e.g. using LinearRegression)
- Unsupervised Learning
	- Dimensionality Reduction (using e.g. PCA)
	- Clustering (using e.g. kMeans)
- Model Selection
	- Overfitting 
	- Hyperparameters
	- Bias/Variance & the Validation Curve
	- Learning Curve
- Data Familiarization
	- `DataFrame.shape`
	- `DataFrame.describe`
	- `pd.plotting.scatter_matrix` or `sns.pairplot`
	- Outliers
	- Scaling of Data
- Scikit-learn Syntax
	- `train_test_split`
	- `confusion_matrix`
	- `.tree.DecisionTreeClassifier(max_depth=?)`
	- `.linear_model.LinearRegression()` and `coef_` and `score(X_test, y_test)`
	- `.decomposition.PCA(n_components=?)` `pca.explained_variance_ratio_`
	- `.cluster.KMeans(n_clusters=?)` & Elbow Method
- Modern ML
	- Basic idea of (Deep) Neural Networks

Best of luck on the exam!