## 1. Programming And Data Science

Data science involves processing, analyzing, and visualizing data. While some tools like Microsoft Excel allow us to perform basic data science tasks, they're limited to the functionality built in to the user interface. If you want to work with datasets that aren't structured like a spreadsheet or create entire new data visualizations from scratch, you'll need to become proficient in **programming**. Instead of using a program written by others that can solve a narrow set of tasks, you can create your own programs that can solve your specific problems.

Programming involves organizing a collection of instructions into a program for a computer to carry out. To express these instructions, we use a **programming language**. In this track, we focus on a language called Python. **Python** is a general-purpose programming language that is becoming more and more popular for doing **data science**. Companies worldwide are using Python to harvest insights from their data and get a competitive edge.


**Python** is a popular choice for working with data because it has good support for:

- handling large datasets
- working with common mathematical functions
- creating powerful data visualizations


What you will learn
- Python versions
- Basic data types
- List
- Files and Loops
- If statements
- Dictionaries
- Functions and Packages

## 2. Python versions

There are currently two different supported versions of Python, 2.7 and 3.6. Python 3.x introduced many backwards-incompatible changes to the language, so code written for 2.7 may not work under 3.x and vice versa. For this class all code will use Python 3.6. 

https://wiki.python.org/moin/Python2orPython3

"Python 2.x is legacy, Python 3.x is the present and future of the language""

## 3. Basic data types

Like most languages, Python has a number of basic types including integers, floats, booleans, and strings. These data types behave in ways that are familiar from other programming languages.




In [None]:
x = 3
print(type(x)) # Prints "<class 'int'>"
print(x)       # Prints "3"
print(x + 1)   # Addition; prints "4"
print(x - 1)   # Subtraction; prints "2"
print(x * 2)   # Multiplication; prints "6"
print(x ** 2)  # Exponentiation; prints "9"
x += 1
print(x)       # Prints "4"
x *= 2
print(x)       # Prints "8"
y = 2.5
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"

Note that unlike many languages, **Python** does not have unary increment **(x++)** or decrement **(x--)** operators. Python also has built-in type for complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-long-complex).

**Booleans**: Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (&&, ||, etc.):

In [None]:
t = True
f = False
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"
print(t != f)  # Logical XOR; prints "True" 

**Strings**: Python has great support for strings:


In [None]:
hello = 'hello'           # String literals can use single quotes
world = "world"           # or double quotes; it does not matter.
print (hello)             # Prints "hello"
print (len(hello))        # String length; prints "5"
hw = hello + ' ' + world  # String concatenation
print(hw)                 # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print(hw12)  # prints "hello world 12"

String objects have a bunch of useful methods, you can find a list of all string methods in the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).

In [None]:
s = "hello"
print (s.capitalize())  # Capitalize a string; prints "Hello"
print (s.upper())       # Convert a string to uppercase; prints "HELLO"
print (s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello"
print (s.center(7))     # Center a string, padding with spaces; prints " hello "
print (s.replace('l', '(ell)'))  # Replace all instances of one substring with another;
                                 # prints "he(ell)(ell)o"
print ('  world '.strip())       # Strip leading and trailing whitespace; prints "world"

## 4. Using a list to store multiple values

So far, we've been storing individual values in variables. Often in data science, we're working with thousands of data points that are grouped together in a certain way and have an order to them. We need a container that can hold multiple values that we can use to perform operations on. We can use a **list**, which is an object that represents a sequence of values. For example, we can represent the cities in our dataset as a list as a sequence of strings (<span style="background-color: #F9EBEA; color:##C0392B">"Natal"</span>, <span style="background-color: #F9EBEA; color:##C0392B">"São Paulo"</span>, and so on).

A list is the Python equivalent of an array, but is resizeable and can contain elements of different types. You can find all about list in [documentation](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists).

In [None]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out areas
print(areas)

# Print out the type of areas
print(type(areas))

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
print(areas[-5])

# Add two new elements to the end of the list
areas.append("laundry")
areas.append(8.75)

# Print out the new list
print(areas)

# Lists have a feature called slicing that allows you to return all of the values between
# a starting index and an ending index. When you slice a list, you return a new list containing
# just the values you're interested in.

# Remove the first two elements of list
del(areas[0:2])
print(areas)

# Remove the last two elements of list
del(areas[-2:])
print(areas)

## 5. Files and Loops

We'll learn how to work with files and use loops to iterate through lists. We'll be working with crime rate data for 73 cities in the United States. Datasets are often represented in files that you can download and manipulate. Before we get started, we'll first need to learn how to work with files in Python.

To open a file in Python, we use the <span style="background-color: #F9EBEA; color:##C0392B">open()</span> function. This function accepts two different arguments (inputs) in the parentheses, always in the following order:

- the name of the file (as a string)
- the mode of working with the file (as a string)


We'll learn about the various modes in a later mission. For now, we'll just use <span style="background-color: #F9EBEA; color:##C0392B">"r"</span>, the mode for reading in files.

When entering multiple inputs, separate them with commas (<span style="background-color: #F9EBEA; color:##C0392B">,</span>). For example, to open a file named <span style="background-color: #F9EBEA; color:##C0392B">story.txt</span> in read mode, we write the following:

>```python
open("story.txt", "r")
```

The <span style="background-color: #F9EBEA; color:##C0392B">open()</span> function returns a *File* object. This object stores the information we passed in, and allows us to call methods specific to the File class. We can assign the File object to a variable so we can refer to it later:

>```python
a = open("story.txt", "r")
```

Note that the File object, <span style="background-color: #F9EBEA; color:##C0392B">a</span>, won't contain the actual contents of the file. It's instead an object that acts as an interface to the file and contains methods for reading in and modifying the file's contents (which we'll cover in the next screen).




In [None]:
#open the file
f = open("crime_rates.csv", "r")

#read the file
data = f.read()

#print data
print(type(data))
print(data)

In [None]:
#split the crime_rates.csv based on '\n' filter
rows = data.split('\n')

#print the first five rows
print(rows[0:5])

In [None]:
#create an empty list
int_crime_rates=[]

#print the rate of crimes for each city using a list(int)
for i in rows:
    int_crime_rates.append(int(i.split(",")[1]))

In [None]:
type(int_crime_rates[0])

## 6. Booleans

We'll learn how to express conditional logic. We can use **conditional logic** to add criteria to the code we write. Some examples of operations that use criteria include:

- Finding all the integers in a list that are greater than <span style="background-color: #F9EBEA; color:##C0392B">"5"</span>.
- Identifying which elements in a list are strings, and printing only those values.

We can break down both of these examples into logic we can code:

- For each integer in a list, if the integer is greater than <span style="background-color: #F9EBEA; color:##C0392B">"5"</span>, add to the list <span style="background-color: #F9EBEA; color:##C0392B">greater_than_five</span>.
- For each element in a list, if the value has a string data type, use the <span style="background-color: #F9EBEA; color:##C0392B">print()</span> function to display it; if it's not a string, ignore it.

Python has a class called **Boolean** that helps express conditional logic. There are only two Boolean values: <span style="background-color: #F9EBEA; color:##C0392B">True</span> and <span style="background-color: #F9EBEA; color:##C0392B">False</span>. Because they're words, Boolean values may look like strings, but they're an entirely separate class. For example, string operations like concatenation won't work with Booleans.

The following code example assigns <span style="background-color: #F9EBEA; color:##C0392B">True</span> to <span style="background-color: #F9EBEA; color:##C0392B">t</span> and <span style="background-color: #F9EBEA; color:##C0392B">False</span> to <span style="background-color: #F9EBEA; color:##C0392B">f</span>:

>```python
t = True
f = False
```

If we display the data type for either <span style="background-color: #F9EBEA; color:##C0392B">t</span> or <span style="background-color: #F9EBEA; color:##C0392B">f</span>, we'll see class <span style="background-color: #F9EBEA; color:##C0392B">'bool'</span>, shorthand for Boolean.


### 6.1 Boolean Operators

Python has comparison operators that allow us to compare variables:

<span style="background-color: #F9EBEA; color:##C0392B">==</span> returns True if both variables are equivalent, and False if they're different

<span style="background-color: #F9EBEA; color:##C0392B">=!</span> returns True if both variables are different, and False if they're equivalent

<span style="background-color: #F9EBEA; color:##C0392B">></span> returns True if the first variable is greater than the second variable, and False otherwise

<span style="background-color: #F9EBEA; color:##C0392B"><</span> returns True if the first variable is less than the second variable, and False otherwise

<span style="background-color: #F9EBEA; color:##C0392B">>=</span> returns True if the first variable is greater than or equal to the second variable, and False otherwise

<span style="background-color: #F9EBEA; color:##C0392B"><=</span> returns True if the first variable is less than or equal to the second variable, and False otherwise

In [None]:
print(8 == 8) # True
print(8 != 8) # False
print(8 == 10) # False
print(8 != 10) # True

In [None]:
rates = [10, 15, 20]
print(rates[0] > rates[1]) # False
print(rates[0] >= rates[0]) # True

## 7. If Statements

Now that we know how to work with Boolean values, let's dive more into how we use Booleans to express conditional logic. To complement Booleans, Python contains the if operator. We can use this operator to write a statement that tests whether certain conditions exist. Our if statement will evaluate to either <span style="background-color: #F9EBEA; color:##C0392B">True</span> or <span style="background-color: #F9EBEA; color:##C0392B">False</span>, and only run the specified code when <span style="background-color: #F9EBEA; color:##C0392B">True</span>.

>```python
sample_rate = 749
greater = (sample_rate > 5)
if greater:                    #This is the conditional statement.
    print(sample_rate)
```

We can nest if statements to specify multiple criteria. 

>```python
value = 1500
if value > 500:
    if value > 1000:
        print("This number is HUGE!")
```

We can also nest if statements within for loops, and vice versa. For example, we can search a list for the existence of a specific value by combining a for loop with an if statement. The if statement determines whether the current element is equivalent to the value we're interested in:

>```python
found = False
for city in cities:
    if city == 'João Pessoa':
        found = True
```

<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

**Description**: 

For this challenge, you'll be working with the data set behind this [FiveThirtyEight](http://fivethirtyeight.com/features/there-are-922-unisex-names-in-america-is-yours-one-of-them/) article on common unisex (gender-neutral) names in the United States. You'll start by reading in the file and iteratively converting the data to more useful representations. At the end of this challenge, you'll filter the data so that it only includes the names that at least 1,000 people share.

The staff at [FiveThirtyEight](http://fivethirtyeight.com/) compiled this data set from information at the [Social Security Adminstration's website](https://www.ssa.gov/oact/babynames/limits.html). You'll work with a shortened version of the full data set to complete this challenge.

Here's a preview of the shortened data set, which is in a CSV file named [unisex_names_table.csv](https://github.com/fivethirtyeight/data/blob/master/unisex-names/unisex_names_table.csv) (tip: raw,save as):

>1) Use the <span style="background-color: #F9EBEA; color:##C0392B">open()</span> function with the following parameters to return a File object: 

>>+ <span style="background-color: #F9EBEA; color:##C0392B">unisex_names_table.csv</span> for the file name.
+ <span style="background-color: #F9EBEA; color:##C0392B">r</span> for read mode
+ Then, use the <span style="background-color: #F9EBEA; color:##C0392B">read()</span> method of the File object to read the file into a string. Assign that string to a variable named <span style="background-color: #F9EBEA; color:##C0392B">names</span>.
>```python
data = open("unisex_names_table.csv","r")
names = data.read()
```

>2) Convert the string to a list

>>+ Use the <span style="background-color: #F9EBEA; color:##C0392B">split()</span> method that *strings* have to split on the new-line delimiter (<span style="background-color: #F9EBEA; color:##C0392B">"\n"</span>), and assign the resulting list to <span style="background-color: #F9EBEA; color:##C0392B">names_list</span>.
+ Select the first five elements in <span style="background-color: #F9EBEA; color:##C0392B">names_list</span>, and assign them to <span style="background-color: #F9EBEA; color:##C0392B">first_five</span>.
+ Display <span style="background-color: #F9EBEA; color:##C0392B">first_five</span> using the <span style="background-color: #F9EBEA; color:##C0392B">print()</span> function.
>```python
names_list = names.split("\n")
#skip the header and tail
names_list = names_list[1:-1]
first_five = names_list[0:5]
print(first_five)
```

>3) Convert the list of strings to a list of lists. Split each element in <span style="background-color: #F9EBEA; color:##C0392B">names_list</span> on the comma delimiter <span style="background-color: #F9EBEA; color:##C0392B">(,)</span> and append the resulting list to <span style="background-color: #F9EBEA; color:##C0392B">nested_list</span>. To accomplish this:

>>+ Create an empty list and assign it to <span style="background-color: #F9EBEA; color:##C0392B">nested_list</span>.
+ Write a for loop that iterates over <span style="background-color: #F9EBEA; color:##C0392B">names_list</span>.
+ Within the loop body, run the <span style="background-color: #F9EBEA; color:##C0392B">split()</span> method on each element to return a list (assign that list to <span style="background-color: #F9EBEA; color:##C0392B">comma_list</span>).
+ Within the loop body, run the <span style="background-color: #F9EBEA; color:##C0392B">append()</span> method to add each list (<span style="background-color: #F9EBEA; color:##C0392B">comma_list</span>) to <span style="background-color: #F9EBEA; color:##C0392B">nested_list</span>.
+ Use the <span style="background-color: #F9EBEA; color:##C0392B">print()</span> function to display the first five elements in <span style="background-color: #F9EBEA; color:##C0392B">nested_list</span>
>```python
nested_list = []
for i in names_list:
    comma_list = i.split(',')
    nested_list.append(comma_list)
```

>4) Create a new list of strings called <span style="background-color: #F9EBEA; color:##C0392B">thousand_or_greater</span> that only contains the names shared by 1,000 people or more. To accomplish this:

>>+ Create an empty list and assign it to <span style="background-color: #F9EBEA; color:##C0392B">thousand_or_greater</span>.
+ Write a for loop that iterates over <span style="background-color: #F9EBEA; color:##C0392B">numerical_list</span>.
>```python
numerical_list = []
for i in nested_list:
    a = i[1]
    b = float(i[2])
    numerical_list.append([a,b])
```
+ In the loop body, use an if statement to determine if the value at index <span style="background-color: #F9EBEA; color:##C0392B">2</span> for that element (which is a list) is greater than or equal to <span style="background-color: #F9EBEA; color:##C0392B">1000</span>.
+ If the value is greater than or equal to <span style="background-color: #F9EBEA; color:##C0392B">1000</span>, use the <span style="background-color: #F9EBEA; color:##C0392B">append()</span> method to add its name to <span style="background-color: #F9EBEA; color:##C0392B">thousand_or_greater</span>.
+ Finally, display the first <span style="background-color: #F9EBEA; color:##C0392B">10</span> elements in <span style="background-color: #F9EBEA; color:##C0392B">thousand_or_greater</span>.

## 8. Dictionaries

A dictionary is like a list in that it has indexes, but the indexes aren't necessarily sequential numbers. We can create our own indexes with values of any data type, including strings.

A dictionary stores (key, value) pairs, similar to a Map in Java or an object in Javascript.

In [70]:
# Definition of countries and capital
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']

# Get index of 'germany': ind_ger
ind_ger = countries.index('germany')

# Use ind_ger to print out capital of Germany
print(capitals[ind_ger])

berlin


In [71]:
# From string in countries and capitals, create dictionary europe
europe = {'spain':'madrid','france':'paris','germany':'berlin','norway':'oslo'}

# Print europe
print(europe)

# Print out the keys in europe
print(europe.keys())

# Print out value that belongs to key 'norway'
print(europe['norway'])

{'spain': 'madrid', 'france': 'paris', 'germany': 'berlin', 'norway': 'oslo'}
dict_keys(['spain', 'france', 'germany', 'norway'])
oslo


In [72]:
# Add italy to europe
europe['italy'] = 'rome'

# Print out italy in europe
print('italy' in europe)

# Add poland to europe
europe['poland'] = 'warsaw'

print(europe)

del(europe['france'])

# Print europe
print(europe)

True
{'spain': 'madrid', 'france': 'paris', 'germany': 'berlin', 'norway': 'oslo', 'italy': 'rome', 'poland': 'warsaw'}
{'spain': 'madrid', 'germany': 'berlin', 'norway': 'oslo', 'italy': 'rome', 'poland': 'warsaw'}


In [73]:
# Dictionary of dictionaries
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }


# Print out the capital of France
print(europe['france']['capital'])

# Create sub-dictionary data
data = {'capital':'rome', 'population':59.83}

# Add data to europe under key 'italy'
europe['italy'] = data

# Print europe
print(europe)

paris
{'spain': {'capital': 'madrid', 'population': 46.77}, 'france': {'capital': 'paris', 'population': 66.03}, 'germany': {'capital': 'berlin', 'population': 80.62}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'italy': {'capital': 'rome', 'population': 59.83}}


### 8.1 Loop over a dictionary

In Python 3, you need the **items()** method to loop over a dictionary:


```python
  world = { "afghanistan":30.55, 
             "albania":2.77,
             "algeria":39.21 }
  for key, value in world.items() :
      print(key + " -- " + str(value))
```

In [74]:
# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn', 
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'australia':'vienna' }
          
# Iterate over europe
for key, value in europe.items():
    print("the capital of " + key + " is " + value)

the capital of spain is madrid
the capital of france is paris
the capital of germany is bonn
the capital of norway is oslo
the capital of italy is rome
the capital of poland is warsaw
the capital of australia is vienna


<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

**Description**: 

In this mission, we'll look at daily weather data for Los Angeles (L.A.) during 2014. Here's a look at the beginning of <span style="background-color: #F9EBEA; color:##C0392B">la_weather.csv</span>, the data set we'll be working:

>```csv
Day,Type of Weather
1,Sunny
2,Sunny
3,Sunny
4,Sunny
5,Sunny
6,Rain
7,Sunny
8,Sunny
9,Fog
10,Rain
```

The first row in our data is the header row, which contains labels for the values beneath them. As the header row indicates, each row has two values:

- <span style="background-color: #F9EBEA; color:##C0392B">Day</span> - A number from <span style="background-color: #F9EBEA; color:##C0392B">1</span> to <span style="background-color: #F9EBEA; color:##C0392B">365</span> indicating the day of the year. January 1st is 1, and December 31st is <span style="background-color: #F9EBEA; color:##C0392B">365</span>.
- <span style="background-color: #F9EBEA; color:##C0392B">Type of Weather</span> - The type of weather experienced on that day. The values that may appear here include <span style="background-color: #F9EBEA; color:##C0392B">Rain</span>, <span style="background-color: #F9EBEA; color:##C0392B">Sunny</span>, <span style="background-color: #F9EBEA; color:##C0392B">Fog</span>, <span style="background-color: #F9EBEA; color:##C0392B">Fog-Rain</span>, or <span style="background-color: #F9EBEA; color:##C0392B">Thunderstorm</span>.

>```python
weather_data = []
f = open("la_weather.csv", 'r')
data = f.read()
rows = data.split('\n')
for row in rows:
    split_row = row.split(",")
    weather_data.append(split_row[1])
weather_data = weather_data[1:]
```

**Instructions**: 

1. Count how many times each type of weather occurs in the weather list, and store the results in a new dictionary called - <span style="background-color: #F9EBEA; color:##C0392B">weather_counts</span>.

>```python
weather_counts = {}
for item in weather_data:
    if item in weather_counts:
    ...
    else:
    ...
```

## 9. Functions and Packages

To leverage the code that brilliant Python developers have written, you'll learn about using functions, methods and packages. This will help you to reduce the amount of code you need to solve challenging problems!


In [82]:
# Maybe you already know the name of a Python function, but you still have to figure out how to use it. 
# Ironically, you have to ask for information about a function with another function: help(). 
# In IPython specifically, you can also use ? before the function name.
# To get help on the max() function, for example, you can use one of these calls:

?max #or # help(max)

In [83]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Print out the index of the element 20.0
print(areas.index(20.0))

# Print out how often 14.5 appears in areas
print(areas.count(14.5))

# Reverse the orders of the elements in areas
areas.reverse()

# Print out areas
print(areas)

2
0
[9.5, 10.75, 20.0, 18.0, 11.25]


In [84]:
# Python functions are defined using the "def" keyword. For example:

def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))
# Prints "negative", "zero", "positive"

negative
zero
positive


In [85]:
# We will often define functions to take optional keyword arguments, like this:

def hello(name, loud=False):
    if loud:
        print('HELLO, %s!' % name.upper())
    else:
        print('Hello, %s' % name)

hello('Ivanovitch') # Prints "Hello, Ivanovitch"
hello('Silva', loud=True)  # Prints "HELLO, SILVA!"


Hello, Ivanovitch
HELLO, SILVA!


**Modules** are pieces of code that other people have written to fulfill common tasks, such as genrating random numbers, performing mathematical operations, etc. 

The basic way to use a module is to add **import module_name** at the top of your code, and then using **module_name.var** to access functions and values with the name **var** in the module. 

For example, as a data scientist, some notions of geometry never hurt. Let's refresh some of the basics. For a fancy clustering algorithm, you want to find the circumference $C$ and area $A$ of a circle. When the radius of the circle is $r$, you can calculate $C$ and $A$ as:

\begin{equation*}
C   = 2 \pi r\\
A   = \pi r^{2}
\end{equation*}

In [87]:
# Import the math package
import math

# Definition of radius
r = 0.43

# Calculate C
C = 2 * math.pi * r

# Calculate A
A = math.pi * r ** 2

# Build printout
print("Circumference: " + str(C))
print("Area: " + str(A))

Circumference: 2.701769682087222
Area: 0.5808804816487527


There is another kind of **import** that can be used if you only need certain functions from a module. These take the form **from module_name import var**, and the **var** can be used as if it were defined normally in your code. For example, to import only the **pi** constant from the previous example:

In [88]:
# Import the math package
from math import pi

print(pi)

3.141592653589793


You can imort a module or object under a different name using the as keyword. This is mainly used when a module or object has a long or confusing name. For example:

In [90]:
from math import sqrt as square_root

print(square_root(10))

3.1622776601683795


<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

**Description**: 

Implement the previous exercise as function. The function must pass as parameter a list of weather data and return a dictionary having how many times each type of weather occurs.