## Data Structures


We have covered in detail much of the basics of python's primitive data types. Its now useful to consider how these basic types can be collected in ways that are meaningful and useful for a variety of tasks. Data structures are a fundamental component of programming, a collection of elements of data that adhere to certain properties, depending on the type. In these notes, we'll present three basic data structures, the list, the set, and the dictionary. Python data structures are very rich, and beyond the scope of this simple primer. Please see [the documentation](http://docs.python.org/2/tutorial/datastructures.html) for a more complete view.


### List:

(Readings: LPTHW, Examples 32-34, and 38)

A list, sometimes called and array or a vector is an ordered collection of values. The value of a particular element in a list is retrieved by querying for a specific index into an array. Lists allow duplicate values, but but indicies are unique. In python, like most programming languages, list indices start at 0, that is, to get the first element in a list, request the element at index 0. Lists provide very fast access to elements at specific positions, but are inefficient at "membership queries," determining if an element is in the array. 

In python, lists are specified by square brackets, `[ ]`, containing zero or more values, separated by commas. Lists are the most common data structure, and are often generated as a result of other functions, for instance:

`a_string.split(" ")`

will take a string, split it on space, and then return a list of the smaller substrings.

To query a specific value from a list, pass in the requested index into square brackets following the name of the list. Negative indices can be used to traverse the list from the right. (Remember the case with strings and accessing the individual characters? It is exactly the same. In fact, strings are treated in Python as lists of characters.)

In [1]:
my_string = "Wow these data structures make for exciting dinner conversation"
list_of_words = my_string.split(" ")
print list_of_words

['Wow', 'these', 'data', 'structures', 'make', 'for', 'exciting', 'dinner', 'conversation']


In [2]:
a_list = [1, 2, 3, 0, 5, 10, 11]
print a_list

[1, 2, 3, 0, 5, 10, 11]


In [3]:
empty_list = []
print empty_list

[]


In [4]:
mixed_list = [1, "a"]
print mixed_list

[1, 'a']


In [7]:
another_list = ["a", "b", "c"]
print another_list[0]
print another_list[1]
print another_list[2]

a
b
c


In [11]:
a_list = [1, 2, 3, 0, 5, 10, 11]
print a_list[-1] # indexing from the right

11


In [12]:
print a_list[-3:]

[5, 10, 11]


Some common functionality of lists:

+ `list.append(x)`: add an element ot the end of a list
+ `list_1.extend(list_2)`: add all elements in the second list to the end of the first list
+ `list.insert(index, x)`: insert element x into the list at the specified index. Elements to the right of this index are shifted over
+ `list.pop(index)`: remove the element at the specified position
+ `list.index(x)`: looks through the list to find the specified element, returning it's position if it's found, else throws an error
+ `list.count(x)`: counts the number of occurrences of the input element
+ `list.sort()`: sorts the list of items
+ `list.reverse()`: reverses the order of the list
+ `len(list)`: returns the number of elements in the list

In [17]:
a_list = ["Panos", "John", "Chris", "Josh", "Mary", "Anna"]

print a_list

a_list.append("Elena")
a_list.append("Marvin")

print a_list

['Panos', 'John', 'Chris', 'Josh', 'Mary', 'Anna']
['Panos', 'John', 'Chris', 'Josh', 'Mary', 'Anna', 'Elena', 'Marvin']


In [None]:
b_list = []
print "Length of list a_list:", len(a_list)
b_list.aepp(a_list)
b_list.extend(a_list)
b_list.extend(a_list)
print b_list
print "Length of list b_list:", len(b_list)

In [None]:
b_list = []
print "Length of list a_list:", len(a_list)
b_list.append(a_list)
b_list.append(a_list)
b_list.append(a_list)
print b_list
print "Length of list b_list:", len(b_list)

In [None]:
b_list.sort()
print b_list

#### Exercise

* Add the letter "d" in `another_list` and print the result
* Add the letter "c" in `another_list` and print the result
* If you search for "c" in `another_list` using the list.index(x) command, what is the result?
* Sort `another_list` and print the result
* Use the `split()` operation for strings (that we learned before) and count the number of words in the sentence "Python is the word. And on and on and on and on..." 

In [None]:
# your code here

#### Exercise

* What is the length of the document below in characters? In words? In paragraphs?
* What is the average length of a word?

In [23]:
washington_post = """MOSCOW — Russian officials vehemently defended the country’s airstrikes in Syria on Thursday as blows to Islamic State militants even as evidence mounted suggesting that U.S.-backed rebels and others were facing the brunt of Moscow’s attacks.

And while Russian officials and diplomats rallied behind President Vladi­mir Putin, the Kremlin’s stance appeared further clouded by acknowledgments that the missions have already extended beyond solely the Islamic State.

In Paris, the Russian ambassador to France, Alexander Orlov, said the Russian attacks also targeted an al-Qaeda-linked group, Jabhat al-Nusra, or al-Nusra Front.

Syria’s ambassador to Russia, Riad Haddad, echoed that the joint hit list for Russia and the Syrian government included Jabhat al-Nusra, which is believed to have some coordination with the Islamic State but is still seen mostly as a rival.

“We are confronting armed terrorist groups in Syria, regardless of how they identify themselves, whether it is Jabhat al-Nusra, the ISIL or others,” he said, using one of the acronyms for the Islamic State.

Graphic Did the Russians really strike the Islamic State? VIEW GRAPHIC 
“They all are pursuing ISIL ends,” he added, according to the Interfax news agency.

The ambassadors did not specifically mention any U.S.- and Western-backed rebel groups.

But the comment was certain to deepen suspicions by Washington and allies that Putin’s short-term aim is to give more breathing space to Syria’s embattled President Bashar al-Assad, whose government is strongly supported by Moscow.

Syrian activists, meanwhile, ramped up their own claims that Moscow was hitting groups seeking to bring down Assad, who has managed to hang on during more than four years of civil war.

Russia’s expanding military intervention in Syria added urgency to separate efforts by Russia and U.S. officials to coordinate strategies against the Islamic State and avoid potential airspace missteps between the two powers — so-called “deconfliction” talks. The Pentagon said the discussions will begin Thursday.

[Washington weighs next move]

One monitoring group, the Britain-based Syrian Observatory for Human Rights, said Russian airstrikes again struck strongholds of an American-backed rebel group, Tajamu Alezzah, in central Hama province.

Ground level: On the scene of controversial Russian airstrikes in Syria	
View Photos	The actions, quickly criticized by Washington, add an unpredictable element to a multilayered war.
The observatory also reported that airstrikes hit the northwestern city Jisr al-Shughour, which is in the hands of rebel groups including al-Nusra, after battles last month to drive back Assad’s forces.

Among the locations hit was a site near Kafr Nabl, the northern Syrian town whose weekly protests against the government, often featuring pithy slogans in English, won it renown as a symbol of what began as a peaceful protest movement against the Assad regime. The local council receives U.S. assistance, and the rebel unit there has received support under a covert CIA program aimed at bolstering moderate rebels.

Raed Fares, one of the leaders of the protest movement in Kafr Nabl, said warplanes struck a Free Syrian Army checkpoint guarding Roman ruins on the outskirts of the town. He said the explosion was bigger than anything local residents had seen in three years of airstrikes conducted by Syrian warplanes.

“It made a fire six kilometers wide,” he told The Washington Post.

Other sites hit on the second day of Russian bombing included locations in the province of Hama. The targets suggested the main intention of the strikes was to shore up government control over a corridor of territory linking the capital, Damascus, to the Assad family’s coastal heartland, where the Russians are operating out of an expanded air base.

Syrian rebels, some of them U.S.-backed, had been making slow but steady gains in the area, considered one of the government’s biggest vulnerabilities. There has been no Islamic State presence there since January 2014, when moderate rebels rose up against the extremists and forced them to retreat to eastern Syria.

[Kerry warns of ‘grave concerns’ about Russia’s intent]

In Washington, Sen. John McCain (R-Ariz.) told CNN he could “absolutely confirm” that airstrikes hit Western-backed groups such as the Free Syrian Army and other factions “armed and trained by the CIA.”

“We have communications with people there,” said McCain, chairman of the Senate Armed Services Committee.

The accounts could not be independently assessed, but the main focus of the Russian attacks appeared to be in areas not known to have strong Islamic State footholds.

In Moscow, the reply was blunt.

“Total rubbish,” Gennady Zyuganov, a member of parliament and leader of Russia’s Communist Party, said of the U.S. accusations.

In televised remarks Thursday, Putin called accusations that Russian airstrikes had killed civilians in Syria “information attacks.”

He also addressed concerns about an accidental military clash between Russian and U.S.-led coalition forces, saying that his intelligence and military agencies were “establishing contacts” with counterparts in the United States.

“This work is ongoing, and I hope that it will conclude with the creation of a regularly acting mechanism,” he said.

A spokesman for Russia’s Defense Ministry, Igor Konashenkov, said Thursday that warplanes hit a dozen Islamic State sites in the past 24 hours, destroying targets including a command center and two arms depots.

[Russia’s strategy in Syria could be a work in progress]

The United States and Russia agree on the need to fight the Islamic State but not about what to do with the Syrian president. The Syrian civil war, which grew out of an uprising against Assad, has killed more than 250,000 people since March 2011 and sent millions of refugees fleeing to countries in the Middle East and Europe.

Accusing Russia of “pouring gasoline on the fire,” Defense Secretary Ashton B. Carter vowed that U.S. pilots would continue their year-long bombing campaign against the Islamic State in Syria, despite Moscow’s warning that American planes should stay away from its operations.

“I think what they’re doing is going to backfire and is counterproductive,” Carter said on Wednesday.

Yet Russia’s military flexing in Syria brought quick overtures from neighboring Iraq, where the Islamic State also holds significant territory but the government is within Washington’s fold.

Iraq’s prime minister, Haider al-Abadi, told France 24 that he “would welcome” Russia joining the U.S.-led airstrikes against Islamic State targets, but there have been no specific discussions.

Joining the protests against the Russian airstrikes was Saudi Arabia, a leading foe of Assad and one of Washington’s top Middle East allies.

At the United Nations late Wednesday, the Saudi ambassador, Abdallah al-Mouallimi, demanded that the Russian air campaign “stop immediately” and accused Moscow of carrying out attacks in areas outside the control of the Islamic State.

In Iran, Assad’s main regional backer, Foreign Ministry spokeswoman Marzieh Afkham called Russia’s military role a step “toward resolving the current crisis” in Syria.

Sly reported from Beirut, and Murphy from Washington. Daniela Deane in London, William Branigin in Washington and Loveday Morris in Baghdad contributed to this report.
"""

In [25]:
# Your code here
num_characters = len(washington_post)
print "The number of characters in the article are:", num_characters

The number of characters in the article are: 7520


In [27]:
words = washington_post.split(" ")
# print words
num_words = len(words)
print "The number of words in the article are:", num_words

The number of words in the article are: 1097


In [31]:
number_of_spaces = washington_post.count(" ")
print number_of_spaces
average_word_lenth = 1.0*(num_characters-number_of_spaces) / num_words
print average_word_lenth

1096
5.85597082954


In [36]:
len(washington_post.split("\n\n"))

38

### Set:

A set is a data structure where all elements are unique. Sets are unordered. In fact, the order of the elements observed when printing a set might change at different points during a programs execution, depending on the state of python's internal representation of the set. Sets are ideal for membership queries, for instance, is a user amongst those users who have received a promotion? 

Sets are specified by curly braces, `{ }`, containing one or more comma separated values. To specify an empty list, you can use the alternative construct, `set()`.

In [39]:
# creating sets
some_set = {4, 1, 2, 3, 4, 4, 4, 4}
another_set = {4, 5, 6}

In [38]:
print some_set

set([1, 2, 3, 4])


In [40]:
# creating an empty set; notice that we do *not* use the "empty set = {}" command
# as someone would expect based on the way that we create an empty list
empty_set = set()
empty_list = []

We can also create a set from a list:

In [44]:
my_list = [1, 2, 3, 0, 5, 10, 11, 1, 5]
my_set = set(my_list)
print my_list
print my_set
print "Length of set", len(my_set)
print "Length of list", len(my_list)

[1, 2, 3, 0, 5, 10, 11, 1, 5]
set([0, 1, 2, 3, 5, 10, 11])
Length of set 7
Length of list 9


#### Exercise 

* What is the number of distinct words in the `washington_post` variable (defined above)?

In [50]:
# your code here
words = washington_post.split(" ")
num_words = len(words)
print "Number of words:", num_words
distinct = len(set(words))
print "Distinct words:", distinct

Number of words: 1097
Distinct words: 607


#### Checking for membership in a set

The easiest way to check for membership in a set is to use the `in` keyword, checking if a needle is "`in`" the set.

In [51]:
my_set = {1, 2, 3, 4}

In [55]:
val = 1
result = val in my_set
print "The value", val, "appears in the variable my_set:", result

The value 1 appears in the variable my_set: True


In [56]:
val = 0
result = val in my_set
print "The value", val ,"appears in the variable my_set:", result

The value 0 appears in the variable my_set: False


We also have the "`not in`" operator

In [59]:
val = 5
result = (val not in some_set)
print "Value %d does *not* appear in some_set:" % val, result
val = 1
result = (val not in some_set)
print "Value %d does *not* appear in some_set:" % val, result


Value 5 does *not* appear in some_set: True
Value 1 does *not* appear in some_set: False


#### Set operators: Add, remove elements; Union, intersection, subset

Some other common set functionality:

+ `set_a.add(x)`: add an element to a set
+ `set_a.remove(x)`: remove an element from a set
+ `set_a - set_b`: elements in a but not in b. Equivalent to `set_a.difference(set_b)`
+ `set_a | set_b`: elements in a or b. Equivalent to `set_a.union(set_b)`
+ `set_a & set_b`: elements in both a and b. Equivalent to `set_a.intersection(set_b)`
+ `set_a ^ set_b`: elements in a or b but not both. Equivalent to `set_a.symmetric_difference(set_b)` 
+ `set_a <= set_b`:	tests whether every element in set_a is in set_b. Equivalent to `set_a.issubset(set_b)`


#### Exercise

Try the above yourself using the `my_set` and `another_set` variables from above, and compute the difference, union, intersection, and symmetric difference, between the two sets.

In [63]:
# Your code here
set_A = {1, 2, 3, 4, 5}
set_B = {4, 5, 6, 7}
print "Set A", set_A
print "Set B", set_B
print "Difference", set_A.difference(set_B)
print "Union", set_A.union(set_B)
print "Intersection", set_A.intersection(set_B)
print "Symmetric Difference", set_A.symmetric_difference(set_B)

Set A set([1, 2, 3, 4, 5])
Set B set([4, 5, 6, 7])
Difference set([1, 2, 3])
Union set([1, 2, 3, 4, 5, 6, 7])
Intersection set([4, 5])
Symmetric Difference set([1, 2, 3, 6, 7])


Now, lets try to use the [Jaccard index similarity](https://en.wikipedia.org/wiki/Jaccard_index) to compute the similarity of the two sets. The Jaccard coefficient is defined as the ratio of the size of the intersection of the two sets, divided by the size of the union of the two sets.

In [66]:
# Your code here
size_union = len(set_A | set_B)
size_intersection = len(set_A & set_B)
print size_union
print size_intersection 
print "Jaccard =", 1.0*size_intersection/size_union

7
2
Jaccard = 0.285714285714


#### Exercise

Now, let's pick a few news articles from the web and paste them in the notebook (as in the case of the Washington Post above). Then compute the similarity of these articles using the Jaccard similarity.

In [77]:
trump1 = '''
It's a step that the European Central Bank, among others, has already taken, resulting in bizarre situations where banks can end up paying customers who borrow from them. The idea has also been floated in the United States.
The Bank of Japan announcement Friday is the latest surprise move by its governor, Haruhiko Kuroda, in his drive to spur momentum in the world's third-largest economy. He had previously denied plans to take the interest rate below zero.
"Governor Kuroda has gained notoriety by changing course when it is least expected, and today's move will only serve to cement this reputation," said Marcel Thieliant of Capital Economics.
Related: Japan's master of surprise shocks with subzero interest rates
Investors responded positively to the announcement. Stocks in Tokyo rose 2.8% and the country's currency, the yen, fell against the dollar.
Financial markets' turbulent start to 2016 has been particularly punishing for Japan. Prior to the central bank's move, stocks had tanked around 10% in January, and the yen had strengthened.
The plunge in crude oil prices, meanwhile, has made it even harder for the Bank of Japan to hit its inflation target of 2%.
The central bank said the Japanese economy was in the midst of a moderate recovery, but it expressed concerns about plummeting oil prices and the uncertain outlook for emerging economies, especially China.
It's unclear how much difference subzero rates will make to the Japanese economy. The ECB has used them among an array of stimulus efforts, but the euro zone has continued to struggle with deflation.
"With interest rates already at record lows, we do not expect these measures to have a significant impact on the real economy, or inflation," said Izumi Devalier, Japan economist at HSBC.
Related: Japan economy minister to resign over funding scandal
The Bank of Japan's decision to introduce a negative interest rate was also far from unanimous. Five policy board members voted in favor, but four opposed the move.
Japan has long struggled with deflation, and prices have been stagnating despite the central bank's aggressive stimulus measures in recent years that include a massive bond-buying program. It said Friday that it was leaving its asset-purchase plan unchanged.
The bank's moves have come at a time when the government of Prime Minister Shinzo Abe has tried to jolt the economy into life by increasing spending and pushing through reforms. That program took a hit Thursday when Abe's economy minister announced his resignation over a political-funding scandal.
The Bank of Japan's announcement also follows closely watched statements from other major central banks amid the recent market turmoil.
Last week, ECB President Mario Draghi gave stocks a lift by promising that the bank could pump out more money as early as March if necessary.
The U.S Federal Reserve, which raised interest rates last month, said Wednesday it was "monitoring global economic and financial developments."

'''

trump2 = '''
TOKYO—Japan’s central bank stunned the markets Friday by setting the country’s first negative interest rates, in a desperate attempt to keep the economy from sliding back into the stagnation that has dogged it for much of the last two decades.

The unexpected move shows the Bank of Japan’s determination to fight global headwinds that threaten to tip the country back into deflation, a damaging cycle of price falls and weakening economy.

ENLARGE
Yet it also shows how few policy options the BOJ has left. The central bank is already buying ¥80 trillion ($674 billion) in assets a year, putting nearly a third of Japan’s massive bond market in its hands. It left the size of that asset-buying program unchanged.

After three years of BOJ asset purchases, inflation expectations in Japan are sagging, and recent volatility in global markets has threatened to undo some of what the BOJ had achieved with its extraordinary easing: a weaker yen and higher stock prices.

BOJ Gov. Haruhiko Kuroda said the decision to introduce negative rates was meant to limit the risk that global conditions would derail the central bank’s efforts to change Japan’s “deflationary mindset.”

“Risks were growing that the slowdown in the Chinese, emerging and resource-producing countries, which has caused volatility and instability in financial markets since the beginning of the year, may hurt confidence among domestic companies,” he said.

The yen fell as much as 2.1% following the announcement, hitting 121.33 to the dollar. The Nikkei Stock Average seesawed before closing up 2.8%. Some government bonds saw rates turn more deeply negative. The two- and five-year yield fell to their most negative yet, both hitting as low as 0.085%.

RELATED

5 Things About the Bank of Japan's Negative Interest Rates
Japan Follows Europe Into Negative-Rate Territory
Heard on the Street: Japan’s Negative-Rate Plunge More Like a Toe in the Water
Kuroda’s Latest BOJ Bazooka Round: Bang or Whimper?
BOJ’s Goal Is Complicated by Sheer Scale of Reserves
The BOJ’s move could also add further pressure on the U.S. Federal Reserve to hold back on raising interest rates, less than a month after it started tightening again, as economies throughout the globe show signs of distress and weakness.

The Bank of Japan is now the second major central bank to set negative interest rates, joining the European Central Bank, which first did so in 2014. The central banks of Sweden, Denmark and Switzerland also have negative interest-rate policies.

The introduction of negative rates once again signaled Mr. Kuroda’s willingness to shock markets. He denied in recent days that the bank was considering negative rates.

Mr. Kuroda has maintained that three years of aggressive quantitative easing have had “intended effects.” He has blamed falling oil prices for the low inflation and pointed to a price index that excludes energy prices as evidence that underlying inflation is strong.

Yet those policies have failed to produce the targeted 2% inflation, and the BOJ on Friday again pushed back the expected time frame for reaching that goal, while again lowering its inflation forecast for the fiscal year beginning in April.

Many economists had concluded that the asset-buying program was nearing the limits of its capacity and effectiveness.

Takuji Okubo, chief economist at Japan Macro Advisors, said the BOJ made the right decision in introducing negative rates—given the limited options available to it. “I don’t think purchasing more JGBs [Japanese government bonds] would have done much,” he said.

ENLARGE
The BOJ said it would leave unchanged the 0.1% rate on most existing reserves held at the central bank, while cutting the rate on required reserves to zero and charging 0.1% on reserves in excess of those required.

In a news conference later Friday, Mr. Kuroda reiterated his running pledge that the BOJ would take additional action if necessary. He didn’t rule out the possibility of increasing the BOJ’s asset-buying.

The goal of the rate regime was to push down borrowing costs to stimulate inflation, the bank said in a statement following its two-day policy meeting. It also said it would cut the interest rate further into negative territory if necessary.

Below-zero rates are used to encourage lending and spur credit growth. When imposed on banks’ reserves with a central bank, financial institutions typically look to lend more or invest elsewhere, in turn boosting asset prices, according to Oxford Economics economists.

Just how effective the BOJ’s move might be is unclear. European central banks that have adopted negative interest-rate policies were aiming to reduce short-term market rates and weaken the exchange rate, economists from London-based think tank Oxford Economics said in a January research note.

Denmark and Sweden struggled to weaken their respective currencies, though, because their biggest trading partner, the eurozone, followed suit to push rates into negative territory, they said.

While economists are still assessing the full impact of negative interest rates on inflation and economic activity in Europe, many say the policy has pushed down banks’ borrowing rates and longer-term bond yields.

Japanese banks will face a more challenging environment as a result of negative rates, but won’t be badly hurt, analysts said. Most of the extra cash they have raised selling government bonds is now parked at the BOJ because domestic lending is sluggish and banks have a hard time finding attractive investments.

“I don’t think there will be a big immediate impact on banks’ profitability,” S&P analyst Kiyoko Ohora said, noting the BOJ’s decision to leave the rate on existing reserves unchanged.

'''

In [80]:
words1 = trump1.lower().split(" ")
words2 = trump2.lower().split(" ")
set_words1 = set(words1)
set_words2 = set(words2)

print "Characters of article 1:", len(trump1)
print "Characters of article 2:", len(trump2)
print "Length of article 1:", len(words1)
print "Length of article 2:", len(words2)
print "Distinct words in article 1:", len(set_words1)
print "Distinct words in article 2:", len(set_words2)


Characters of article 1: 2980
Characters of article 2: 5789
Length of article 1: 461
Length of article 2: 878
Distinct words in article 1: 291
Distinct words in article 2: 463


In [81]:
intersection = len( set_words1.intersection(set_words2) )
union = len( set_words1.union(set_words2) )
print "Common words = ", intersection
print "Total words = ", union
similarity = 1.0*intersection/union
print similarity

Common words =  95
Total words =  659
0.144157814871


### Tuples

A tuple consists of a number of values separated by commas, for instance:

In [82]:
t = (12345, 54321, 'hello!')
print t

(12345, 54321, 'hello!')


In [83]:
print t[2]

hello!


In [84]:
print "Two elements. The first one: %s and the second one %s:" % ("NYU", "stern")

Two elements. The first one: NYU and the second one stern:


### Dictionaries

(Readings: LPTHW, Ex 39)

Dictionaries, sometimes called dicts, maps, or, rarely, hashes are data structures containing key-value pairs. Dictionaries have a set of unique keys and are used to retrieve the value information associated with these keys. For instance, a dictionary might be used to store for each user, that user's location, or for a product id, the description associated with that product. Lookup into a dictionary is very efficient, and because these data structures are very common, they are frequently used and encountered in practice. 

Dictionaries are specified by curly braces, `{ }`, containing zero or more comma separated key-value pairs, where the keys and values are separated by a colon, `:`. Like a list, values for a particular key are retrieved by passing the query key into square brackets.

In [85]:
a_dict = {"a":1,
          "b":2, 
          "c":3, 
          "d":4}
print a_dict

{'a': 1, 'c': 3, 'b': 2, 'd': 4}


In [88]:
# A key cannot be repeated
# See what happens when we repeat the key "c"
a_dict = {"a":1, "b":4, "c":3, "c": 4}
print a_dict

{'a': 1, 'c': 4, 'b': 4}


In [92]:
freegeoip_dict = {
"ip": "216.165.95.68",
"country_code": "US",
"country_name": "United States",
"region_code": "NY",
"region_name": "New York",
"city": "New York",
"zip_code": "10003",
"time_zone": "America/New_York",
"latitude": 40.7317,
"longitude": -73.9885,
"metro_code": 501
}
print freegeoip_dict

{'region_code': 'NY', 'region_name': 'New York', 'ip': '216.165.95.68', 'metro_code': 501, 'country_name': 'United States', 'country_code': 'US', 'city': 'New York', 'time_zone': 'America/New_York', 'longitude': -73.9885, 'latitude': 40.7317, 'zip_code': '10003'}


In [93]:
print freegeoip_dict["ip"]

# or, alternatively

print freegeoip_dict.get("ip")

216.165.95.68
216.165.95.68


In [95]:
freegeoip_dict["isp"] = "New York University"

from pprint import pprint
pprint(freegeoip_dict)

{'city': 'New York',
 'country_code': 'US',
 'country_name': 'United States',
 'ip': '216.165.95.68',
 'isp': 'New York University',
 'latitude': 40.7317,
 'longitude': -73.9885,
 'metro_code': 501,
 'region_code': 'NY',
 'region_name': 'New York',
 'time_zone': 'America/New_York',
 'zip_code': '10003'}


In [None]:
a_dict = {"a":1, "b":2, "c":3, "c": 4}
another_dict = {"c":5, "d":6}
print a_dict["c"]

Like the set, the easiest way to check if a particular **key** is in a dictionary is through the `in` keyword:

In [97]:
a_dict = {"a":"e", "b":2, "c":3, "c": 4}
print "b" in a_dict
print "z" in a_dict

True
False


Notice that the `in` will not work if we try to find a value in the dictionary.

In [98]:
# This does *not* work for values
a_dict = {"a":"e", "b":2, "c":3, "c": 4}
print "e" in a_dict

False


In [None]:
a_dict = {"a":"e", "b":2, "c":3, "c": 4}
print "e" in a_dict

Some common operations on dictionaries:

+ `dict.keys()`: returns a list containing the keys of a dictionary
+ `dict.values()`: returns a list containing the values in a dictionary
+ `dict.pop(x)`: removes the key and its associated value from the dictionary

In [115]:
value_for_key = "e"
b_dict = {"b":2, 
          "c":3, 
          "c": 4, 
          "z": value_for_key}
b_dict["a"] = b_dict["b"]+ b_dict["c"]
print "Keys:",  b_dict.keys()

Keys: ['a', 'c', 'b', 'z']


In [116]:
print "Values:", a_dict.values()

Values: [0.5, 4, 2, 'e']


In [117]:
print len(a_dict)

4


#### Exercise

* Find the common keys in `a_dict` and `b_dict`
* Find the common values in `a_dict` and `b_dict` 


In [122]:
# your code here
a_dict = {"a":"e", "b":2, "c":3, "c": 4}
b_dict = {"c":5, "d":6}

akeys = set(a_dict.keys())
bkeys = set(b_dict.keys())
print(akeys & bkeys)

avalues = set(a_dict.values())
bvalues = set(b_dict.values())

print (avalues & bvalues)

set(['c'])
set([])


### Combining (Nesting) Data Structures:

There are many opportunities to combine data types in python. Lists can be populated by arbitrary data structures. Similarly, you can use any type as the value in a dictionary. However, the elements of sets, and the keys of dictionaries need to have some special properties that allow the mechanics of the data structure to determine how to store the element.

Aside: to use a particular element in a set or as a key in a dictionary, it must define a [hash function](http://en.wikipedia.org/wiki/Hash_function), `__hash__`. In a nutshell, a hash function maps a data element to a number in a predefined range, based on the characteristics of that element. Because the contents of a data structure might change, so too would the value of their associated `__hash__` function, causing problems for the algorithms powering sets and dictionaries.

In [125]:
print "lists of lists"
lol = [ [1, 2, 3], [4, 5, 6, 7] ]
lol_2 = [ [4, 5, 6], [7, 8, 9] ]
print len(lol)

lists of lists
2


In [127]:
print "retrieving data from this data structure"
print "Lol[1]:",lol[1]

retrieving data from this data structure
Lol[1]: [4, 5, 6, 7]


In [None]:
print "Lolol[0][0]:",lolol[0][0]

In [None]:
print "Lolol[0][0][0]:",lolol[0][0][0]

In [128]:
print "data structures as values in a dictionary"
dlol = {"lol":lol, "lol_2":lol_2}
print dlol

data structures as values in a dictionary
{'lol': [[1, 2, 3], [4, 5, 6, 7]], 'lol_2': [[4, 5, 6], [7, 8, 9]]}


In [129]:
# Access the list [4, 5, 6, 7] in the "lol" key
print "Accessing the list of lists named lol:", dlol["lol"]
print "Accessing the second element :", dlol["lol"][1]

Accessing the list of lists named lol: [[1, 2, 3], [4, 5, 6, 7]]
Accessing the second element : [4, 5, 6, 7]


In [None]:
print "retrieving data from this dictionary"
print dlol["lol"]
print dlol["lol"][0]
print dlol["lol"][0][0]

#### Exercise

You are given the following data structure.

`data = {
    "Panos": {
        "Job":"Professor", 
        "YOB": "1976", 
        "Children": ["Gregory", "Anna"]
        }, 
    "Joe": {
        "Job":"Data Scientist", 
        "YOB": "1981"
        }
    }`

You need to write code that

* Prints the job of Joe
* Prints the year of birth of Panos
* Prints the children of Panos
* Prints the second child of Panos
* Prints the number of people entries in the data
* Checks if Maria is in the data
* Checks if Panos has children
* Checks if Joe has children

In [153]:
# your code here
data = {"Panos": 
            {
            "Job":"Professor", 
            "YOB": "1976", 
            "Children": ["Gregory", "Anna"]
            }, 
        "Joe": 
            {
            "Job":"Data Scientist", 
            "YOB": "1981",
            "Children": []
            }
        }

In [133]:
# Prints the job of Joe
print data["Joe"]["Job"]

Data Scientist


In [135]:
# Prints the year of birth of Panos
print data["Panos"]["YOB"]

1976


In [136]:
# Prints the children of Panos
print data["Panos"]["Children"]

['Gregory', 'Anna']


In [138]:
# Prints the second child of Panos
print data["Panos"]["Children"][1]

Anna


In [139]:
# Prints the number of people entries in the data
print len(data)

2


In [142]:
# Checks if Maria is in the data
print "Maria" in data

False


In [143]:
# Checks if Panos has children
print "Children" in data["Panos"]

True


In [154]:
# Checks if Joe has children
print ("Children" in data["Joe"]) and (len(data["Joe"]["Children"])>0)


False
