In [1]:
# Import libraries
import numpy as np
import pandas as pd

from pprint import pprint

In [2]:
# Read data from file into list of list
baby_records = list(np.genfromtxt('data/baby_names.csv', delimiter=',', skip_header=1,
                                  encoding='utf-8', dtype=None))

In [3]:
baby_records[:2]

[(2011, 'FEMALE', 'HISPANIC', 'GERALDINE', 13, 75),
 (2011, 'FEMALE', 'HISPANIC', 'GIA', 21, 67)]

In [4]:
df = pd.read_csv('data/baby_names.csv')
print(df.head())

   BRITH_YEAR  GENDER  ETHNICTY       NAME  COUNT  RANK
0        2011  FEMALE  HISPANIC  GERALDINE     13    75
1        2011  FEMALE  HISPANIC        GIA     21    67
2        2011  FEMALE  HISPANIC     GIANNA     49    42
3        2011  FEMALE  HISPANIC    GISELLE     38    51
4        2011  FEMALE  HISPANIC      GRACE     36    53


In [5]:
girl_names = list(df[df.GENDER == 'FEMALE'].NAME.unique())
boy_names = list(df[df.GENDER == 'MALE'].NAME.unique())

baby_names_2011 = set(df[(df.BRITH_YEAR.isin([2011, 2012]))].NAME.str.title())
baby_names_2014 = set(df[(df.BRITH_YEAR.isin([2013, 2014]))].NAME.str.title())

# 01. Fundamental data types

This chapter will introduce you to the fundamental Python data types - lists, sets, and tuples. These data containers are critical as they provide the basis for storing and looping over ordered data. To make things interesting, you'll apply what you learn about these types to answer questions about the New York Baby Names dataset!

## 01.01 Introduction and lists

See the video.

In [6]:
# Accessing single items in list
cookies = ['chocolate chip', 'peanut butter', 'sugar']
cookies.append('Tirggel')
print(cookies)
print(cookies[2])

['chocolate chip', 'peanut butter', 'sugar', 'Tirggel']
sugar


In [7]:
# Combining Lists
cakes = ['strawberry', 'vanilla']
desserts = cookies + cakes
print(desserts)

['chocolate chip', 'peanut butter', 'sugar', 'Tirggel', 'strawberry', 'vanilla']


In [8]:
# Finding Elements in a List
position = cookies.index('sugar')
print(position)

print(cookies[position])

2
sugar


In [9]:
# Removing Elements in a List
name = cookies.pop(position)
print(name)
print(cookies)

sugar
['chocolate chip', 'peanut butter', 'Tirggel']


In [10]:
# Iterating over lists
for cookie in cookies:
    print(cookie)

chocolate chip
peanut butter
Tirggel


In [11]:
# Sorting lists
print(cookies)

sorted_cookies = sorted(cookies, key=str.lower)
print(sorted_cookies)

['chocolate chip', 'peanut butter', 'Tirggel']
['chocolate chip', 'peanut butter', 'Tirggel']


## 01.02 Manipulating lists for fun and profit

You may be familiar with adding individual data elements to a list by using the __.append()__ method. However, if you want to combine a list with another array type (list, set, tuple), you can use the __.extend()__ method on the list.

You can also use the __.index()__ method to find the position of an item in a list. You can then use that position to remove the item with the __.pop()__ method.

In this exercise, you'll practice using all these methods!

**Instructions**

1. Create a list called baby_names with the names 'Ximena', 'Aliza', 'Ayden', and 'Calvin'.
2. Use the .extend() method on baby_names to add 'Rowen' and 'Sandeep' and print the list.
3. Use the .index() method to find the position of 'Aliza' in the list. Save the result as position.
4. Use the .pop() method with position to remove 'Aliza' from the list.
5. Print the baby_names list. This has been done for you, so hit 'Submit Answer' to see the results!

**Results:**<br>
<font color=darkgreen>Well done! Notice how the second time you print baby_names, 'Aliza' is no longer in the list.</font>

In [12]:
# Create a list containing the names: baby_names
baby_names = ['Ximena', 'Aliza', 'Ayden', 'Calvin']

# Extend baby_names with 'Rowen' and 'Sandeep'
baby_names.extend(['Rowen', 'Sandeep'])

# Print baby_names
print(baby_names)

# Find the position of 'Aliza': position
position = baby_names.index('Aliza')

# Remove 'Aliza' from baby_names
_ = baby_names.pop(position)

# Print baby_names
print(baby_names)

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'Rowen', 'Sandeep']
['Ximena', 'Ayden', 'Calvin', 'Rowen', 'Sandeep']


## 01.03 Looping over lists

You can use a __for__ loop to iterate through all the items in a list. You can take that a step further with the __sorted()__ function which will sort the data in a list from lowest to highest in the case of numbers and alphabetical order if the list contains strings.

The __sorted()__ function returns a new list and does not affect the list you passed into the function. You can learn more about __sorted()__ in the Python documentation (https://docs.python.org/3/library/functions.html#sorted).

A list of lists, __records__ has been pre-loaded. If you explore it in the IPython Shell, you'll see that each entry is a list of this form:

<code>['2011', 'FEMALE', 'HISPANIC', 'GERALDINE', '13', '75']</code>

The name of the baby (__'GERALDINE'__) is the fourth entry of this list. Your job in this exercise is to loop over this list of lists and append the names of each baby to a new list called __baby_names__.

**Instructions**

1. Create an empty list called baby_names.
2. Use a for loop to iterate over each row of records appending the name, found in the fourth element of row, to baby_names.
3. Print each name in baby_names in alphabetical order. To do this:
4. Use the sorted() function as part of a for loop to iterate over the sorted names, printing each one.

**Results:**<br>
<font color=darkgreen>Wonderful. As you can see, baby_names is now in alphabetical order.</font>

In [13]:
# Create the empty list: baby_names
baby_names = []

# Loop over records 
for row in baby_records:
    # Add the name to the list
    baby_names.append(row[3])
    
# Sort the names in alphabetical order
sorted_baby_names = sorted(baby_names)

# Print first 30 names
print(sorted_baby_names[:30])

# Print last 10 names
print(sorted_baby_names[-10:])

['AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AALIYAH', 'AARAV', 'AARAV', 'AARAV', 'AARAV', 'AARAV', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON', 'AARON']
['Zoey', 'Zoey', 'Zoey', 'Zoey', 'Zoey', 'Zoey', 'Zoey', 'Zoya', 'Zuri', 'Zuri']


## 01.04 Meet the Tuples

See the video.

## 01.05 Data type usage

Which data type would you use if you wanted your data to be immutable and ordered?

**Answer the question**
1. List.
2. String.
__3. Tuple.__
4. Set.

**Results:**<br>
<font color=darkgreen>Well done! Tuples are indeed immutable and ordered. You'll be using them a lot in this course!</font>

## 01.06 Using and unpacking tuples

Tuples are made of several items just like a list, but they cannot be modified in any way. It is very common for tuples to be used to represent data from a database. If you have a tuple like <code>('chocolate chip cookies', 15)</code> and you want to access each part of the data, you can use an index just like a list. However, you can also "unpack" the tuple into multiple variables such as type, <code>count = ('chocolate chip cookies', 15)</code> that will set type to 'chocolate chip cookies' and count to 15.

Often you'll want to pair up multiple array data types. The __zip()__ function does just that. It will return a list of tuples containing one element from each list passed into __zip()__.

When looping over a list, you can also track your position in the list by using the __enumerate()__ function. The function returns the index of the list item you are currently on in the list and the list item itself.

You'll practice using the __enumerate()__ and __zip()__ functions in this exercise, in which your job is to pair up the most common boy and girl names. Two lists - __girl_names__ and __boy_names__ - have been pre-loaded into your workspace.

**Instructions**

1. Use the zip() function to pair up girl_names and boy_names into a variable called pairs.
2. Use a for loop to loop through pairs, using enumerate() to keep track of your position. Unpack pairs into the variables idx and pair.
3. Inside the for loop:
4. Unpack pair into the variables girl_name and boy_name.
5. Print the rank, girl name, and boy name, in that order. The rank is contained in idx.

**Results:**<br>
<font color=darkgreen>Excellent work! What are some of the most common girl names and boy names?</font>

In [14]:
print(len(girl_names), len(boy_names))

1512 1339


In [15]:
# Pair up the girl and boy names: pairs
pairs = zip(girl_names, boy_names)

# Iterate over pairs
result = []
for idx, pair in enumerate(pairs):
    # Unpack pair: girl_name, boy_name
    girl_name, boy_name = pair
    # Print the rank and names associated with each rank
    result.append('Rank {}: {} and {}'.format(idx, girl_name, boy_name))
pprint(result[:10])

['Rank 0: GERALDINE and AARAV',
 'Rank 1: GIA and AARON',
 'Rank 2: GIANNA and ABDUL',
 'Rank 3: GISELLE and ABDULLAH',
 'Rank 4: GRACE and ADAM',
 'Rank 5: GUADALUPE and ADITYA',
 'Rank 6: HAILEY and ADRIAN',
 'Rank 7: HALEY and AHMED',
 'Rank 8: HANNAH and AIDAN',
 'Rank 9: HAYLEE and AIDEN']


## 01.07 Making tuples by accident

Tuples are very powerful and useful, and it's super easy to make one by accident. All you have to do is create a variable and follow the assignment with a comma. This becomes an error when you try to use the variable later expecting it to be a string or a number.

You can verify the data type of a variable with the __type()__ function. In this exercise, you'll see for yourself how easy it is to make a tuple by accident.

**Instructions**

1. Create a variable named normal and set it equal to 'simple'.
2. Create a variable named error and set it equal 'trailing comma',.
3. Print the type of the normal and error variables.

**Results:**<br>
<font color=darkgreen>Great work! As you can see, the trailing comma caused error to be stored as a tuple instead of as a string. Watch out for those trailing commas!</font>

In [16]:
# Create the normal variable: normal
normal = 'simple'

# Create the mistaken variable: error
error = 'trailing comma',

# Print the types of the variables
print(type(normal))
print(type(error))

<class 'str'>
<class 'tuple'>


## 01.08 Sets for unordered and unique data

See the video.

In [17]:
# Creating Sets
cookies_eaten_today = ['chocolate chip', 'peanut butter',
                       'chocolate chip', 'oatmeal cream', 'chocolate chip']
types_of_cookies_eaten = set(cookies_eaten_today)
print(types_of_cookies_eaten)

{'chocolate chip', 'oatmeal cream', 'peanut butter'}


In [18]:
# Modifying Sets
types_of_cookies_eaten.add('biscotti')
types_of_cookies_eaten.add('chocolate chip')
print(types_of_cookies_eaten)

{'chocolate chip', 'oatmeal cream', 'peanut butter', 'biscotti'}


In [19]:
# Updating Sets
cookies_hugo_ate = ['chocolate chip', 'anzac']
types_of_cookies_eaten.update(cookies_hugo_ate)
print(types_of_cookies_eaten)

{'oatmeal cream', 'peanut butter', 'anzac', 'chocolate chip', 'biscotti'}


In [20]:
# Removing data from sets
types_of_cookies_eaten.discard('biscotti')
print(types_of_cookies_eaten)
print(types_of_cookies_eaten.pop())
print(types_of_cookies_eaten.pop())
print(types_of_cookies_eaten)

{'oatmeal cream', 'peanut butter', 'anzac', 'chocolate chip'}
oatmeal cream
peanut butter
{'anzac', 'chocolate chip'}


In [21]:
# Two sets
cookies_jason_ate = set(['chocolate chip', 'oatmeal cream',
'peanut butter'])
cookies_hugo_ate = set(['chocolate chip', 'anzac'])
print('Jason: ', cookies_jason_ate)
print('Hugo : ', cookies_hugo_ate)

Jason:  {'chocolate chip', 'oatmeal cream', 'peanut butter'}
Hugo :  {'chocolate chip', 'anzac'}


In [22]:
# Set Operations - Similarities
cookies_jason_ate = set(['chocolate chip', 'oatmeal cream',
'peanut butter'])
cookies_hugo_ate = set(['chocolate chip', 'anzac'])
print('Eaten by Jason and Hugo: ', cookies_jason_ate.union(cookies_hugo_ate))

Eaten by Jason and Hugo:  {'chocolate chip', 'oatmeal cream', 'peanut butter', 'anzac'}


In [23]:
# Set Operations - Differences
print('Not eaten by Hugo: ', cookies_jason_ate.difference(cookies_hugo_ate))
print('Not eaten by Jason: ', cookies_hugo_ate.difference(cookies_jason_ate))

Not eaten by Hugo:  {'oatmeal cream', 'peanut butter'}
Not eaten by Jason:  {'anzac'}


## 01.09 Finding all the data and the overlapping data between sets

Sets have several methods to combine, compare, and study them all based on mathematical set theory. The __.union()__ method returns a set of all the names found in the set you used the method on plus any sets passed as arguments to the method. You can also look for overlapping data in sets by using the __.intersection()__ method on a set and passing another set as an argument. It will return an empty set if nothing matches.

Your job in this exercise is to find the union and intersection in the names from 2011 and 2014. For this purpose, two sets have been pre-loaded into your workspace: __baby_names_2011__ and __baby_names_2014__.

One quirk in the baby names dataset is that names in 2011 and 2012 are all in upper case, while names in 2013 and 2014 are in title case (where the first letter of each name is capitalized). Consequently, if you were to compare the 2011 and 2014 data in this form, you would find no overlapping names between the two years! To remedy this, we converted the names in 2011 to title case using Python's __.title()__ method.

Real-world data can often come with quirks like this - it's important to catch them to ensure your results are meaningful.

**Instructions**

1. Combine all the names in baby_names_2011 and baby_names_2014 by computing their union. Store the result as all_names.
2. Print the number of names that occur in all_names. You can use the len() function to compute the number of names in all_names.
3. Find all the names that occur in both baby_names_2011 and baby_names_2014 by computing their intersection. Store the result as overlapping_names.
4. Print the number of names that occur in overlapping_names.

**Results:**<br>
<font color=darkgreen>Wonderful work! As you can see from the output of len(overlapping_names), there are 987 overlapping names between the two sets.</font>

In [24]:
# Find the union: all_names
all_names = baby_names_2011.union(baby_names_2014)

# Print the count of names in all_names
print(len(all_names))

# Find the intersection: overlapping_names
overlapping_names = baby_names_2011.intersection(baby_names_2014)

# Print the count of names in overlapping_names
print(len(overlapping_names))

1629
1182


## 01.10 Determining set differences

Another way of comparing sets is to use the __difference()__ method. It returns all the items found in one set but not another. It's important to remember the set you call the method on will be the one from which the items are returned. Unlike tuples, you can __add()__ items to a set. A set will only add items that do not exist in the set.

In this exercise, you'll explore what names were common in 2011, but are no longer common in 2014. The set __baby_names_2014__ has been pre-loaded into your workspace. As in the previous exercise, the names have been converted to title case to ensure a proper comparison.

**Instructions**

1. Create an empty set called baby_names_2011. You can do this using set().
2. Use a for loop to iterate over each row in records:
3. If the first column of each row in records is '2011', add its fourth column to baby_names_2011. Remember that Python is 0-indexed!
4. Find the difference between baby_names_2011 and baby_names_2014. Store the result as differences.
5. Print the differences. This has been done for you, so hit 'Submit Answer' to see the result!

**Results:**<br>
<font color=darkgreen>Excellent work, and congratulations, you've completed Chapter 1! Having learned about lists, tuples, and sets, you're now ready to learn all about dictionaries. See you in Chapter 2!</font>

In [25]:
# Create the empty set: baby_names_2011
baby_names_2011 = set()

# Loop over records and add the names from 2011 to the baby_names_2011 set
for row in baby_records:
    # Check if the first column is '2011'
    if row[0] == 2011:
        # Add the fourth column to the set
        baby_names_2011.add(row[3])

# Find the difference between 2011 and 2014: differences
differences = baby_names_2011.difference(baby_names_2014)

# Print the differences
print(len(differences))

1206


# Aditional material

- **Datacamp course**: https://learn.datacamp.com/courses/data-types-for-data-science-in-python
- **Sorted**: https://docs.python.org/3/library/functions.html#sorted
- **Example of sorted**: https://docs.python.org/3/howto/sorting.html#sortinghowto