# SciPy Teen Track

**Part 1: Monday Morning Session (Before Break)**

**Emily Quinn Finney**

## Introduction
Today we are going to review the basics of Python, including some of its
special built-in objects. We will also study other SciPy packages useful
for data analysis. These include NumPy ("Numerical Python"), which helps
analyze numerical data with many-dimensional tables called arrays, and
matplotlib ("Matlab Plot Library"), which helps visualize data with various
plotting tools. We will go through examples of Python code, as well as
write our own scripts. At the end of the day tomorrow, we will do a 
mini-analysis that puts all those skills together.

We will have a morning session and an afternoon session. The morning 
session is longer and will include a fifteen-minute break. The 
afternoon session will be a shorter session at the end of the day, after
listening to some speakers talk about how they use Python in their work.

Here's our tentative schedule:

**Monday**

***9am-10:15am***   Introduction to Jupyter Notebooks. Introduction to Python 
              syntax (data types, if/else statements, indexing, string 
              slicing). Introduction to importing packages. Brief 
              introduction to NumPy.

***10:30am-12pm***  Introduction to more advanced data types (dictionaries) 
              and their functionality. Introduction to for/while
              loops and list comprehensions. More practice with NumPy.
              
***12pm-1pm***      Lunch

***1pm-3:15pm***    Speakers

***3:30pm-5pm***    Introduction to functions. Introduction to matplotlib.
              We'll start to learn what's cool in our IMDB data by
              making plots of movie popularity.
              
**Tuesday**

***9:45am-12pm***   Tour of the Texas Advanced Computing Center (TACC)
              
***12pm-1pm***      Lunch

***1pm-3:15pm***    Speakers

***3:30pm-5pm***    Mini-analysis of the IMDB data set, including questions
              and presentations to fellow tutorial-attendees.


You should have a notepad of Enthought post-it nots. Please take two of them.
On one post-it note, draw a smiley face or write "I'm finished". On the other,
draw a frowning face or write "I need help." As we are going through the 
material, if you have finished the tutorial task, put your smiley face post-it 
note in front of you to let me know you're done. If you are confused about
the task, put up your frowning post-it note. In addition, if you have a question
or for any reason want to talk to someone about the tutorial material, feel
free to raise your hand to ask me or one of the tutorial volunteers for
assistance. That's what we're here for!

## Introduction to Jupyter Notebooks.

Set up computers for the tutorial:
Access the files from the link https://github.com/eqfinney/SciPy
Click "clone or download" in the upper right corner.
Click "download zip" and wait for the file to download.
After you extract the files, you should be able to open them in your 
Canopy environment.
We'll look around at my notebooks, then try creating your own.

In [2]:
# what is a notebook
# how to add a cell
# how to run a cell
# a little bit about Python error messages
2+2

4

## Introduction to basic Python syntax
Many of you are already familar with the basic syntax of Python. But we're
going to spend this first section reviewing, to make sure everyone is on
the same page for later in the day. I learned some new stuff about Python
while preparing for this tutorial, so even if you are already familiar
with basic Python syntax be on the lookout for cool new tidbits! Again, if
you feel lost or stuck, put up a red post-it note or raise your hand. I
and the tutorial volunteers are happy to help.

## Data types

If you've used a programming language before, you are probably familiar
with some of the objects you can use to communicate with your computer.

In [36]:
# ints, floats, booleans, strings
2.0
print(type(2.0))

2
print(type(2))

True
print(type(True))

"Hi scipy"
print(type("Hi scipy"))

<class 'float'>
<class 'int'>
<class 'bool'>
<class 'str'>


In [171]:
# lists
[1,2,3]
print(type([1,2,3]))

['h','i',' ','s','c','i','p','y']
print(type(['h','i',' ','s','c','i','p','y']))

<class 'list'>
<class 'list'>


In [6]:
# assigning variables
movie_title = "Batman Begins"
print(movie_title)
print(type(movie_title))

Batman Begins
<class 'str'>


# Exercise:
Form into groups of 2-4 people and introduce yourselves! 
Take a few minutes to make one or two objects of each data type.
Discuss with your group an example of when each data type might be used
in the real world.

# Exercise:
What happens if you try using mathematical operations (like + or -) on
different data types? For example, can you add two strings? Can you 
subtract an int and a float? Try several examples to see what happens.
Then (if you aren't already) discuss your findings with your group.

## Concatenating, indexing, slicing
So now that we've got types down, I'd like to start using slightly more 
interesting examples.

In [7]:
# concatenating
movie_title + " is awesome!"
print(movie_title + " is awesome!")

Batman Begins is awesome!


In [8]:
# indexing
movie_title[0]
print(movie_title[0])

B


In [9]:
movie_title[1]
print(movie_title[1])

a


In [10]:
movie_title[-1]
print(movie_title[-1])

s


In [11]:
# slicing
print(movie_title[6:8])

 B


In [12]:
# finding the length of a string
len(movie_title)
print(movie_title)
print(len(movie_title))

Batman Begins
13


# Exercise:
Think of a sentence you could use as a string, and assign it to a variable.
Then, print that sentence to your screen with no spaces.
Finally, find the length of your sentence, with spaces and without spaces.
Subtract these two values to find how many spaces your sentence has!

# Exercise:
It turns out that many of the tricks we can do with strings (indexing, 
slicing, concatenating, finding the length) are also things we can do with
lists! Create a list of your group's favorite movies. Then try:
- Indexing the list (for example, finding and printing out just one person's
  favorite movie)
- Slicing the list (finding and printing out a small section of the list)
- Concatenating two lists together
- Finding the length of a list

# Exercise:
Discuss with your group how we might access just one letter of one string
in a list of strings. For example, how would we access the letter 'Z' in
['potato', 'carrot', 'onion', 'pepper', 'Zucchini', 'celery]?

In [16]:
# If/else
# introductory example 
name = "Emily"
if name == "Huaying":
    print("Hello Huaying!")
elif name == "Christian":
    print("Hello Christian!")
elif name == "Jim":
    print("Hello Jim!")
else: 
    print("Hello Daniel!")

Hello Daniel!


In [17]:
# another example

name = input("What is your name? ")
answer = input("Are you a Teen Track volunteer? ")

if answer == "yes":
    if name == "Huaying":
        print("Thanks for helping, Huaying!")
    elif name == "Christian":
        print("Thanks for helping, Christian!")
    elif name == "Jim":
        print("Thanks for helping, Jim!")
    elif name == "Daniel": 
        print("Thanks for helping, Daniel!")
    else:
        print("Pleased to meet you!")
else:
    print("Welcome to the tutorial, " + name)

What is your name? Emily
Are you a Teen Track volunteer? No
Welcome to the tutorial, Emily


In [18]:
# another example, but with animals and movies this time

answer = input("What is your favorite animal? ")
length = len(answer)
if answer[1:length] == "at":
    print("Your favorite movie must be " + str(answer) + "man!")
elif answer == "spider":
    print("Your favorite movie must be " + str(answer) + "man!")
else:
    print("I don't think you even like movies.")

What is your favorite animal? Rat
Your favorite movie must be Ratman!


# Challenge Exercise:
Create a VERY short choose-your-own-adventure game!
Let the user make up to two choices before the game ends. 
For example, you could start the game by printing a few sentences of a story,
and then ask the user what to do using the command input().
The next step of the story depends on the user's answer.

## Importing packages
A package is a collection of Python operations that go above and
beyond what you might use in every program. For instance, I may not 
need to calculate a square root every day. But if I do, I can use the 
math package.

In [19]:
import math
print(math.sqrt(9))

3.0


In [20]:
from math import sqrt
print(sqrt(9))

3.0


In [21]:
import math as m
print(m.sqrt(9))

3.0


In [22]:
from math import sqrt as sq
print(sq(9))

3.0


## Introduction to NumPy
Later in this tutorial, we are going to be working with numerical data 
about movies (like the movie's budget, average IMDB score, number of
Facebook likes, etc.) When we want to store numerical data in a table,
one of the best packages to use is NumPy, which stands for Numerical
Python. So we're going to import the NumPy package and spend the rest
of our time in this section of the tutorial playing with NumPy data
types.

We load a NumPy table by using the loadtxt command. The documentation 
is here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
There are various options we can use to load the text; we will use the
sep = ' ' option because that tells NumPy to interpret spaces as 
ways to delineate new columns in our table.

In [27]:
import numpy as np
table = np.loadtxt('imdb_numerical.txt')

In [28]:
# NumPy rows
print(table[0])

[  0.00000000e+00   7.23000000e+02   1.78000000e+02   0.00000000e+00
   8.55000000e+02   1.00000000e+03   7.60505847e+08   8.86204000e+05
   4.83400000e+03   3.05400000e+03   2.37000000e+08   2.00900000e+03
   9.36000000e+02   7.90000000e+00   3.30000000e+04]


In [29]:
# NumPy columns
print(table[:,0])

[  0.00000000e+00   1.00000000e+00   3.00000000e+00 ...,   5.03500000e+03
   5.03700000e+03   5.04200000e+03]


In [30]:
# accessing a single element
data_point = table[5,3]
print(data_point)

15.0


You may be wondering: how exactly do we know what columns are what? 
The short answer is, NumPy doesn't have a very straightforward way 
to label columns. (There is another package, Pandas, that is great for
this, but we won't have time to cover it.) Use your text editor to 
open up the file 'imdb_numerical.txt' and read the first line of the 
file to get the column names. In the future, we may choose to name 
individual columns so we don't have to remember that column 0 is the 
budget, for instance.

In [32]:
id = table[:,0]
num_critic_for_reviews = table[:,1]
# etc.

That's all we'll go over for now. We'll have lots more time to use NumPy.
In the next section, we'll talk about how to add and remove items from 
lists, and we'll learn about a new data type, dictionaries. We'll also 
learn how to create loops, which allow us to quickly perform commands 
on a lot of data. 