# Differences and Applications of List, Tuple, Set and Dictionary in Python


Lists: are just like dynamic sized arrays, declared in other languages (vector in C++ and ArrayList in Java). Lists need not be homogeneous always which makes it a most powerful tool in Python.

Tuple: A Tuple is a collection of Python objects separated by commas. In someways a tuple is similar to a list in terms of indexing, nested objects and repetition but a tuple is immutable unlike lists that are mutable.

Set: A Set is an unordered collection data type that is iterable, mutable and has no duplicate elements. Python’s set class represents the mathematical notion of a set.

Dictionary: in Python is an unordered collection of data values, used to store data values like a map, which unlike other Data Types that hold only single value as an element, Dictionary holds key:value pair. Key value is provided in the dictionary to make it more optimized.

List, Tuple, Set, and Dictionary are the data structures in python that are used to store and organize the data in an efficient manner.

# List
A list can contain different data types.  
Using "+" will concatenate lists.
Use lists for a collection of values where order matters and you want to select entire subsets.
Use extend to concatenate lists and other arrays.
Use pop to remove an item from the list.
Use sorted() function to sort a list

In [22]:
my_list = ['one', 1, True]
print(my_list.index('one'))
my_list2 = my_list + ['two', 2, False]
print(my_list2)
my_list2.extend(['three', 3])
my_list2.pop(my_list2.index('two'))
print(my_list2)

alpha = ['Z', 'A', 'D', 'C', 'B', 'D']
print(sorted(alpha))
from collections import Counter
print(Counter(alpha))
print(Counter(alpha).most_common(3))

0
['one', 1, True, 'two', 2, False]
['one', 1, True, 2, False, 'three', 3]
['A', 'B', 'C', 'D', 'D', 'Z']
Counter({'D': 2, 'Z': 1, 'A': 1, 'C': 1, 'B': 1})
[('D', 2), ('Z', 1), ('A', 1)]


# Tuple
A tuple is immutable, holds data in order, index, pairing, unpackable
Uses parentheses


In [5]:
# Use zip to pair 2 lists into a tuple
girl_names = ['Kerry', 'Ria', 'Emily', 'Mary', 'Patricia']
boy_names = ['Mike', 'Adam', 'Sean', 'Paul', 'Fred']
pairs = zip(girl_names, boy_names)
# Iterate over pairs
for idx, pair in enumerate(pairs):
    # Unpack pair: girl_name, boy_name
    girl_name, boy_name = pair
    # Print the rank and names associated with each rank
    print('Rank {}: {} and {}'.format(idx, girl_name, boy_name))

Rank 0: Kerry and Mike
Rank 1: Ria and Adam
Rank 2: Emily and Sean
Rank 3: Mary and Paul
Rank 4: Patricia and Fred


# Set
A set is unique, unordered, mutable, Python's implementation of Set Theory from Mathematics
add() method to add a single item to a set (must be unique or it will do nothing)
update() method to merge in another set or list
discard() method to remove an item from the set by value
pop() method removes and returns an arbitrary element from the set
union() method returns a set of all the names (or) from 2 sets
intersection() method identifies overlapping data (and) from 2 sets
difference() method identifies data present in a set that is not in another set

In [9]:
baby_names_2011 = set(['Kerry', 'Ria', 'Emily', 'Mary', 'Patricia', 'Mike', 'Sean'])
baby_names_2014 = set(['Mike', 'Adam', 'Sean', 'Paul', 'Fred', 'Kerry', 'Emily'])
# Find the union: all_names
all_names = baby_names_2011.union(baby_names_2014)

# Print the count of names in all_names
print(len(all_names))

# Find the intersection: overlapping_names
overlapping_names = baby_names_2011.intersection(baby_names_2014)

# Print the count of names in overlapping_names
print(len(overlapping_names))

# Find the difference between 2011 and 2014: differences
differences = baby_names_2011.difference(baby_names_2014)

# Print the differences
print(differences)

10
4
{'Mary', 'Patricia', 'Ria'}


# Numpy Array
A numpy array must contain all the same data type or python will convert to common data type (i.e. string).
Using "+" will perform addition or concatenation to items in the list.
Define array with a list of lists in order to support objects with multiple data points and types.

In [12]:
import numpy as np
my_np_array = np.array([1, 2, 3])
print(my_np_array)
my_np_array2 = my_np_array + [4, 5, 6]
print(my_np_array2)
my_np_array2 = np.append(my_np_array2, [7, 8, 9])
print(my_np_array2)
print(my_np_array2[3:])

my_family = np.array([["Dad", 49, "M"], ["Mom", 48, "F"], ["Adam", 25, "M"]])
print(my_family[1][2])

[1 2 3]
[5 7 9]
[5 7 9 7 8 9]
[7 8 9]
F


# Dictionary
Dictionaries use key:value pairs.
Dictionaries use curly braces.
Keys must be unique and immutable.
Use dictionaries where you want to lookup using unique keys.

In [17]:
my_family_dictionary = {"Dad":49, 'Mom':48, 'Adam':25}
print(my_family_dictionary)
print(my_family_dictionary.keys())
print(my_family_dictionary['Adam'])
my_family_dictionary['Ria'] = 22
print(my_family_dictionary)
print('Ria' in my_family_dictionary)
del(my_family_dictionary['Adam'])
print(my_family_dictionary)
print(sorted(my_family_dictionary, reverse=True)[:2])

# Dictionary of dictionaries
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }


# Print out the capital of France
print(europe['france']['capital'])
print(europe.get('germany'))
print(type(europe.get('ireland')))
print(europe.get('ireland', 'Not Found'))
print(europe.keys())
print(europe['norway'].keys())

# add item to dictionary
england = {'capital':'london', 'population':70.543}
europe['england'] = england
# add items to dictionary using update() method with tuples
europe['italy'] = {}
europe['italy'].update([('capital', 'rome'), ('population', 52.232)])
print(europe)

# remove items from dictionary - del is not safe without try/catch - pop() is safe
france = europe.pop('france')
ukraine = europe.pop('ukraine', {})
del europe['spain']
print(europe)

{'Dad': 49, 'Mom': 48, 'Adam': 25}
dict_keys(['Dad', 'Mom', 'Adam'])
25
{'Dad': 49, 'Mom': 48, 'Adam': 25, 'Ria': 22}
True
{'Dad': 49, 'Mom': 48, 'Ria': 22}
['Ria', 'Mom']
paris
{'capital': 'berlin', 'population': 80.62}
<class 'NoneType'>
Not Found
dict_keys(['spain', 'france', 'germany', 'norway'])
dict_keys(['capital', 'population'])
{'spain': {'capital': 'madrid', 'population': 46.77}, 'france': {'capital': 'paris', 'population': 66.03}, 'germany': {'capital': 'berlin', 'population': 80.62}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'england': {'capital': 'london', 'population': 70.543}, 'italy': {'capital': 'rome', 'population': 52.232}}
{'germany': {'capital': 'berlin', 'population': 80.62}, 'norway': {'capital': 'oslo', 'population': 5.084}, 'england': {'capital': 'london', 'population': 70.543}, 'italy': {'capital': 'rome', 'population': 52.232}}


# Pandas DataFrames
Pandas is built on numpy.
Used to store tabular data where you can label rows and columns.
DataFrames can be built from dictionaries or from csv files.
For csv imports define row labels using 'index_col'.
ex: cars = pd.read_csv('cars.csv', index_col=0)

In [31]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right':dr, 'cars_per_cap':cpc}

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)

# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

# Print cars again
print(cars)

# Print country column as a Pandas Series (single bracket)
print('Series..........')
print(cars['country'])
print(cars.loc['JPN'])
print(cars.iloc[1])
# print(cars.loc['JPN', 'EG']) # cannot select multiple as series
# print(cars.iloc[2, 6])       # cannot select multiple as series
# Print out drives_right value of Morocco
print(cars.loc['MOR'], 'drives_right')
# Print out drives_right column as Series
print(cars.loc[:, 'drives_right'])

# Print country column as a Pandas DataFrame (double bracket)
print('DataFrame..........')
print(cars[['country']])
print(cars[['country', 'drives_right']])
print(cars.loc[['JPN']])
print(cars.iloc[[1]])
print(cars.loc[['JPN', 'EG']])
print(cars.iloc[[2, 6]])
# Print out fourth, fifth and sixth observation (rows)
print(cars[3:6])
# Print sub-DataFrame
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])
# Print out drives_right column as DataFrame
print(cars.loc[:, ['drives_right']])
# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:, ['cars_per_cap', 'drives_right']])

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45
           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45
Series..........
US     United States
AUS        Australia
JPN            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
country         Japan
drives_right    False
cars_per_cap      588
Name: