# Data Types, Structure, and Algorithms


## Data Types

* Numeric types: int, float, complex
* Iterator types: generator -- can be iterators of any data type such as string, list, tuple...
* Sequence types: lists, tuple, array
* Text sequence types: Strings 
* Binary sequence types: bytes, bytearray, memoryview  
* Set types: set 
* Mapping types: dictionary 

In [1]:
# Some example data types
a = 2
print("The data type of a", type(a))
b = 2.0
print("The data type of b", type(b))
c = 3+5j
print("The data type of c", type(c))
d = "Hello"
print("The data type of d", type(d))
e = [2.0, 5, "Hello"]
print("The data type of e", type(e))
f = (2.0, 5, "Hello")
print("The data type of f", type(f))
g ={'a',  5, 'b', 10}
print("The data type of g", type(g))
h ={'a': 5, 'b': 10}
print("The data type of h", type(h))

The data type of a <class 'int'>
The data type of b <class 'float'>
The data type of c <class 'complex'>
The data type of d <class 'str'>
The data type of e <class 'list'>
The data type of f <class 'tuple'>
The data type of g <class 'set'>
The data type of h <class 'dict'>


In [2]:
# Iterating over a tuple (immutable --- can't be changed)
# While tuples are immutable, lists are mutable

print("\nTuple Iteration")
t = ("geeks", "for", "geeks")
for i in t:
    print(i)


Tuple Iteration
geeks
for
geeks


In [3]:
print("The data type of c", type(i))

The data type of c <class 'str'>


## Data Structure

Data structures are the fundamental constructs around which you build your programs. Each data structure provides a particular way of storing, managing, organizing, and searching data in a computer so it can be accessed efficiently, depending on your use case. For instance pandas has its own data structure for one-dimensional (Series) and two-dimensional (DataFrame) data. 
Generally, data structure are mainly classified into two types: 
* Linear data structure: linked lists, stacks ...
* Non-linear data structure: trees, graphs ...

### List 

A list is the main data structure that used to store a mutable sequence of elements. There are several methods of list such as append, extend, insert, remove, pop, clear, index, count, sort, reverse and copy. The following code provides all possible methods of list. Just list = dot (.) + Tab button. 

In [4]:
list = [3, 5, 2, 5, 6, 4, 7]
#list.<TAB>

Using Lists as Stacks
Using Lists as Queues
List Comprehensions
Nested List Comprehensions

## Algorithm Analysis
In short, Algorithm is a set of steps for a computer program to accomplish a task. It can also defined as;  algorithm is a generic, step-by-step list of instructions for solving a problem. It is a method for solving any instance of the problem such that given a particular input, the algorithm produces the desired result.There may be many programs for the same algorithm, depending on the programmer and the programming language being used. There are multiple ways to solve a problem using a computer program. Can we say one algorithm is better than the other? Algorithms can be compared based upon the amount of computing resources each algorithm uses. 

Algorithm analysis assesses the complexity of different algorithms and finding the most efficient one to solve a certain problem. Big-O notation is a statistical measure used to describe the complexity of algorithms. Time complexity is the computational complexity that describes the amount of time it takes to run an algorithm. Big-O notation is a method for determining how fast an algorithm is.

In [5]:
%timeit '[print(x) for x in range(100)]'

10.7 ns ± 0.186 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


In [None]:
%timeit '[print(x) for x in range(10)]'

The following two example shows the amount of time each program takes to do similar tasks which is summing a range of 100 million values. We can see that the second example uses much less computing resource. The first example takes an average of ~ 7.4 seconds, whereas the second example takes nearly zero seconds.

In [None]:
# Example 1 -a
import time
def sum_of_n_1(n):
        start = time.time()

        the_sum1 = 0
        for i in range(1, n + 1):
            the_sum1 = the_sum1 + i

        end = time.time()

        return the_sum1, end - start
for i in range(5):
    print("Sum is %d required %10.7f seconds" % sum_of_n_1(100000000))

In [None]:
# Example 1 -b 
def sum_of_n_2(n):
    start = time.time()
    

    the_sum2 = (n * (n + 1)) / 2

    end = time.time()
    
    return the_sum2, end - start
for i in range(5):
    print("Sum is %d required %10.7f seconds" % sum_of_n_2(100000000))   

In [None]:
# Another Example: calculates the factorial of a number entered by the user
# Example 2-a
def fact(n):
    product = 1
    for i in range(n):
        product = product * (i+1)
    return product

print (fact(70))
%timeit fact(70)

In [None]:
# Example 2-b
def fact2(n):
    if n == 0:
        return 1
    else:
        return n * fact2(n-1)

print (fact2(70))
%timeit fact2(70)

In [None]:
!pip install folium

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # library to convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans  # import k-means from clustering stage

import folium # map rendering library

import urllib.request    # library module for opening URL
from bs4 import BeautifulSoup    # library for pulling data out of HTML and XML files

print('Libraries imported.')

In [None]:
# data from internet
url = "https://en.wikipedia.org/wiki/List_of_Virginia_locations_by_per_capita_income"
page = urllib.request.urlopen(url)

In [None]:
soup = BeautifulSoup(page, "html")
right_table=soup.find('table', class_='wikitable sortable')
right_table

In [None]:
A=[]
B=[]
C=[]
D=[]
E=[]
F=[]
G=[]
H=[]

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==8:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))
        D.append(cells[3].find(text=True))
        E.append(cells[4].find(text=True))
        F.append(cells[5].find(text=True))
        G.append(cells[6].find(text=True))
        H.append(cells[7].find(text=True))

In [None]:
import pandas as pd
df=pd.DataFrame(A,columns=['Rank'])
df['Name']=B
df['County_or_City']=C
df['PerCapitaIncome']=D
df['MedianHHIncome']=E
df['MedianFamIncome']=F
df['Population']=G
df['NumberOfHH']=H

df = df.replace('\n','', regex=True)
df.columns = ['Rank', 'Name','County_or_City',  'PerCapitaIncome', 'MedianHHIncome', 'MedianFamIncome', 'Population', 'NumberOfHH']
df.head()
#df.shape

In [None]:
url2 = 'http://www.usa.com/rank/virginia-state--median-household-income--zip-code-rank.htm'
page2 = urllib.request.urlopen(url2)

In [None]:
url2

In [None]:
soup2 = BeautifulSoup(page2, "html")
right_table=soup2.find('table', class_='wikitable sortable')
right_table

In [None]:
import pandas as pd
df=pd.DataFrame(A,columns=['Rank'])
df['Name']=B
df['County_or_City']=C
df['PerCapitaIncome']=D
df['MedianHHIncome']=E
df['MedianFamIncome']=F
df['Population']=G
df['NumberOfHH']=H

df = df.replace('\n','', regex=True)
df.columns = ['Rank', 'Name','County_or_City',  'PerCapitaIncome', 'MedianHHIncome', 'MedianFamIncome', 'Population', 'NumberOfHH']
df.head()
#df.shape

In [None]:
# import geocoder # import geocoder

# # initialize your variable to None
# lat_lng_coords = None

# # loop until you get the coordinates
# while(lat_lng_coords is None):
#     g = geocoder.google('{}, Virginia'.format(Name))  # tried as postal_code too
#     lat_lng_coords = g.latlng

# latitude = lat_lng_coords[0]
# longitude = lat_lng_coords[1]

In [None]:
import requests
import lxml.html as lh
import pandas as pd

In [None]:
#url='http://pokemondb.net/pokedex/all'
url = 'http://www.usa.com/rank/virginia-state--median-household-income--zip-code-rank.htm'
#Create a handle, page, to handle the contents of the website
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')

In [None]:
#Check the length of the first 12 rows
[len(T) for T in tr_elements[:12]]

In [None]:
tr_elements = doc.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[1]:
    i+=1
    name=t.text_content()
    print ('%d:"%s"'%(i,name))
    col.append((name,[]))

In [None]:
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 10, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1

In [None]:
[len(C) for (title,C) in col]

In [None]:
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)

In [None]:
df.head()

In [None]:
# new data frame with split value columns 
new = df["Zip / Population"].str.split(" / ", n = 1, expand = True) 
  
# making separate first name column from new data frame 
df["Zip"]= new[0] 
  
# making separate last name column from new data frame 
df["Population"]= new[1] 
  
# Dropping old Name columns 
df.drop(columns =["Zip / Population"], inplace = True) 
  
# df display 
df.head()

In [None]:
# df display 
df = df[1:]
df.head()

In [None]:
# import geocoder # import geocoder

# # initialize your variable to None
# lat_lng_coords = None

# # loop until you get the coordinates
# while(lat_lng_coords is None):
#     g = geocoder.google('{}, Virginia, VA'.format(zip))  # tried as postal_code too
#     lat_lng_coords = g.latlng

# latitude = lat_lng_coords[0]
# longitude = lat_lng_coords[1]

In [None]:
!pip install pgeocode

In [None]:
import pgeocode

In [None]:
nomi = pgeocode.Nominatim('us')

In [None]:
nomi.query_postal_code(df['Zip'].tolist())

In [None]:
x =[]

In [None]:
x = df['Zip'].tolist()
type(x)

In [None]:
# Find the minimum value
def findMin(alist):
    overallmin = alist[0]
    for i in alist:
        issmallest = True
        for j in alist:
            if i > j:
                issmallest = False
        if issmallest:
            overallmin = i
    return overallmin
print(findMin([5, 6, 7, 2, 9, 1, 10]) )
print(findMin([5, 6, 0,  7, 2, 9, 1, 10]) )
print(findMin([5, 6, 7, 2, 9, 11, 10, 3]) )

In [None]:
# Find the minimum value from a randomly generated numbers with timing
# 
import time
from random import randrange
def findMin(alist):
    overallmin = alist[0]
    for i in alist:
        issmallest = True
        for j in alist:
            if i > j:
                issmallest = False
        if issmallest:
            overallmin = i
    return overallmin

for listSize in range(1000, 10001, 1000):
    alist = [randrange(100000) for x in range(listSize)]
    start = time.time()
    print(findMin(alist))
    end = time.time()
    print("size: %d time:%f" % (listSize, end -start))

In [None]:
a=5
b=6
c=10
n=100
for i in range(n):
    for j in range(n):
        x = i * i
        y = j * j
        z = i * j
for k in range(n):
    w = a*k + 45
    v = b*b
d = 33