# Chapter 2: Data Structure and Functions


Main reference:<br>
- Chapter 2&3, Python for Data Analysis, by Wes McKinney

Last edited: 05/16/2021

**Contents of this Notebook:**

- [Section 1. Python Basics](#Section-1.-Python-Basics)
- [Section 2. Data Structure](#Section-2.-Data-Structure)
- [Section 3. Function](#Section-3.-Function)
- [Section 4. Files and the Operating System (05/11/2021 updated)](#Section-4.-Files-and-the-Operating-System)


## Section 1. Python Basics

### Indentation, not braces

Python uses whitespace (tabs or spaces) to structure code instead of using braces as in
many other languages like R, C++, Java, and Perl.

```python
for x in range(10):
    if x < 5:
        print("0")
    else:
        print("1")
```

A **colon** denotes the start of an indented code block after which all of the code must
be **indented** by the same amount (4 spaces or 1 tab) until the end of the block.

In [1]:
# %load_ext nb_black
for x in range(10):
    if x < 5:
        print("0")
    else:
        print("1")

0
0
0
0
0
1
1
1
1
1


<IPython.core.display.Javascript object>

Break up a long code into multiple lines:

- Just using parentheses

- Using explicit line break "\\" backsplash sign

In [2]:
# %load_ext nb_black
# Line Break
a = "1" + "2" + "3" + "4" + "5"

a

'12345'

<IPython.core.display.Javascript object>

### Comments

Preceded by a hash mark (pound sign) "#".

```python
# This is a comment
```

### Variable and argument passing

When assigning a variable (or name) in Python, you are creating a **reference** to the
object on the righthand side of the equals sign.

In [3]:
a = [1, 2, 3]

a

[1, 2, 3]

<IPython.core.display.Javascript object>

In [4]:
b = a

b

[1, 2, 3]

<IPython.core.display.Javascript object>

Let's change the first element in `b`.

In [5]:
b[0] = 9

b

[9, 2, 3]

<IPython.core.display.Javascript object>

In [6]:
a

[9, 2, 3]

<IPython.core.display.Javascript object>

Note that `a`'s first element changed as well. This is because variable `b` is just a reference, not a copy of variable `a`.

<img src="images/refnotcopy.png" alt="Drawing" style="width: 520px;"/>

In [7]:
c = a.copy()

<IPython.core.display.Javascript object>

In [8]:
c

[9, 2, 3]

<IPython.core.display.Javascript object>

### Attributes and methods

Objects typically have both attributes and methods (intrinsic functions).

In [9]:
a = "foo"

<IPython.core.display.Javascript object>

In [10]:
# a.<Press Tab>
a.isdigit()

False

<IPython.core.display.Javascript object>

### Binary operators and comparisons

Most of the binary math operations and comparisons are as you might expect.

In [11]:
1 + 2

3

<IPython.core.display.Javascript object>

In [12]:
2 ** 3

8

<IPython.core.display.Javascript object>

In [13]:
1 > 2

False

<IPython.core.display.Javascript object>

In [14]:
type(1 > 2)

bool

<IPython.core.display.Javascript object>

In [15]:
int(1 > 2)

0

<IPython.core.display.Javascript object>

#### Comparisons

- `is`: check if refering to same object. Commonly used to check if `None`.

- `is not`: check if not refering to same object

- `==`: Check if two variables are equal

In [16]:
a = [1, 2, 3]

c = a.copy()

<IPython.core.display.Javascript object>

In [17]:
print(a is c)

print(a == c)

False
True


<IPython.core.display.Javascript object>

<img src="images/binaryop.png" alt="Drawing" style="width: 520px;"/>

### Single value type
<img src="images/datatype.png" alt="Drawing" style="width: 520px;"/>

In [18]:
# None is different with np.nan
a = None

print(a)

None


<IPython.core.display.Javascript object>

In [19]:
import numpy as np

b = np.nan

b

nan

<IPython.core.display.Javascript object>

In [20]:
a == b

False

<IPython.core.display.Javascript object>

In [21]:
np.isnan(b)

True

<IPython.core.display.Javascript object>

#### String
You can write string literals using either single quotes ' or double quotes ".

More in Chapter 6.

In [22]:
a = "Economics"
b = "Economics"

a == b

True

<IPython.core.display.Javascript object>

In [23]:
c = "economics 'major'"

c

"economics 'major'"

<IPython.core.display.Javascript object>

In [24]:
# integer to float
float(5)

5.0

<IPython.core.display.Javascript object>

In [25]:
5.0

5.0

<IPython.core.display.Javascript object>

### If and for loop

In [26]:
# Remember to indent the main content
x = 2

if x < 0:
    print("It's negative")

<IPython.core.display.Javascript object>

In [27]:
x = 3.5

if x < 0:
    print("It's negative")
elif x == 0:
    print("Equal to zero")
elif 0 < x < 5:
    print("Positive but smaller than 5")
else:
    print("Positive and larger than or equal to 5")

Positive but smaller than 5


<IPython.core.display.Javascript object>

`for` loops are for iterating over a collection (like a list or tuple) or an iterater.

```python
for value in collection:
    # do something with value
```

An iterater is any Python object capable of returning its members one at a time, permitting it to be iterated over in a for-loop.

In [28]:
x = 10

for number in range(x):
    print(number)

0
1
2
3
4
5
6
7
8
9


<IPython.core.display.Javascript object>

## Section 2. Data Structure

### 2.1 Tuple
- fixed-length
- immutable: once created, not possible to modify value
- enclosed values in parentheses

In [29]:
# Create a tuple
tup = (4, 5, 6)

<IPython.core.display.Javascript object>

<span style="color:blue">**Note that sequences are 0-indexed in Python.**</span>

In [30]:
# Access element in tuple
tup[0]

4

<IPython.core.display.Javascript object>

In [31]:
# immutable
# tup[0] = 3

<IPython.core.display.Javascript object>

In [32]:
# concatenate tuples
tup + (10, 11) + (100, 101)

(4, 5, 6, 10, 11, 100, 101)

<IPython.core.display.Javascript object>

In [33]:
# concatenate tuples by multiplying an integer
tup * 3

(4, 5, 6, 4, 5, 6, 4, 5, 6)

<IPython.core.display.Javascript object>

### 2.2 List
- variable-length
- content canbe modified
- enclosed values in square brackets

In [34]:
# Create a list
a_list = [2, 3, 7, None, "foo"]

<IPython.core.display.Javascript object>

In [35]:
# convert a tuple to a list
b_list = list(tup)

<IPython.core.display.Javascript object>

In [36]:
b_list

[4, 5, 6]

<IPython.core.display.Javascript object>

In [37]:
# Adding element
a_list.append(8)

a_list

[2, 3, 7, None, 'foo', 8]

<IPython.core.display.Javascript object>

In [38]:
# remove an element
a_list.pop(5)

a_list

[2, 3, 7, None, 'foo']

<IPython.core.display.Javascript object>

In [39]:
len(a_list)

5

<IPython.core.display.Javascript object>

In [40]:
# Concatenating and combining lists, extend
c_list = [11, 12]

a_list.extend(c_list)
# extend is more efficient than
# a_list = a_list + c_list

a_list

[2, 3, 7, None, 'foo', 11, 12]

<IPython.core.display.Javascript object>

In [41]:
type((1, 2, 3))

tuple

<IPython.core.display.Javascript object>

In [42]:
d = [1, 2, 3, [4, 5]]

<IPython.core.display.Javascript object>

In [43]:
d[3][0]

4

<IPython.core.display.Javascript object>

###### The `list` function is frequently used in data processing as a way to materialize an iterator. For example, it can be used in a for loop as counting iteration number.

In [44]:
gen = range(10)

gen

range(0, 10)

<IPython.core.display.Javascript object>

In [45]:
# Generate a list of 10 integers starting from 0
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

<IPython.core.display.Javascript object>

In [46]:
# example of range in a for loop
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


<IPython.core.display.Javascript object>

In [47]:
# for loop example of iteratable
for i in list(gen):
    print(i)

0
1
2
3
4
5
6
7
8
9


<IPython.core.display.Javascript object>

In [48]:
a_list

[2, 3, 7, None, 'foo', 11, 12]

<IPython.core.display.Javascript object>

In [49]:
# for loop example of iteratable with enumerate
for _, value in enumerate(a_list):
    #     print(i)
    print(value)

2
3
7
None
foo
11
12


<IPython.core.display.Javascript object>

###### Selecting sections of a list by using slicing notation

<img src="images/slicing.png" alt="Drawing" style="width: 520px;"/>

In [50]:
string = list("HELLO!")

string

['H', 'E', 'L', 'L', 'O', '!']

<IPython.core.display.Javascript object>

In [51]:
# Slicing semantics and reversing a list
string[0:1]

['H']

<IPython.core.display.Javascript object>

In [52]:
string[:-2]

['H', 'E', 'L', 'L']

<IPython.core.display.Javascript object>

In [53]:
string[::-1]

['!', 'O', 'L', 'L', 'E', 'H']

<IPython.core.display.Javascript object>

In [54]:
d = reversed(string)

for i in d:
    print(i)

!
O
L
L
E
H


<IPython.core.display.Javascript object>

In [55]:
s = sorted(string)

for i in s:
    print(i)

!
E
H
L
L
O


<IPython.core.display.Javascript object>

### 2.3 Dictionary
- `key`-`value` pairs
- Using curly braces `{}` and colons to separate keys and values

In [56]:
# Creating a dict
d1 = {"a": "some value", "b": [1, 2, 3, 4]}

d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

<IPython.core.display.Javascript object>

In [57]:
# Access, insert and set element
d1["a"]

'some value'

<IPython.core.display.Javascript object>

In [58]:
d1[7] = "economics"

d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'economics'}

<IPython.core.display.Javascript object>

In [59]:
d1["b"].extend([11, 12, 13])

<IPython.core.display.Javascript object>

In [60]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4, 11, 12, 13], 7: 'economics'}

<IPython.core.display.Javascript object>

In [61]:
# Check if a dict contains a key
"c" in d1

False

<IPython.core.display.Javascript object>

In [62]:
# d1["c"]

d1.get("c", 0.0)

0.0

<IPython.core.display.Javascript object>

In [63]:
d1.keys()

dict_keys(['a', 'b', 7])

<IPython.core.display.Javascript object>

In [64]:
list(d1.values())

['some value', [1, 2, 3, 4, 11, 12, 13], 'economics']

<IPython.core.display.Javascript object>

### Error and Exception Handling

In [65]:
# d1["c"]
try:
    c_value = d1["c"]
except:
    c_value = 0.0

c_value

0.0

<IPython.core.display.Javascript object>

In [66]:
for element in d1:
    print(d1[element])

some value
[1, 2, 3, 4, 11, 12, 13]
economics


<IPython.core.display.Javascript object>

In [67]:
d1["a"]

'some value'

<IPython.core.display.Javascript object>

### 2.4 Set
- unordered and unique element
- enclosed values in curly braces

In [68]:
# create a set
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

<IPython.core.display.Javascript object>

In [69]:
{1, 1, 1, 2, 2, 2, 3, 3, 3}

{1, 2, 3}

<IPython.core.display.Javascript object>

###### Sets support mathmatical *set operations* like union and intersection

In [70]:
s1 = {1, 2, 3, 4, 5}
s2 = {5, 6, 7}

<IPython.core.display.Javascript object>

In [71]:
# union
s1.union(s2)

{1, 2, 3, 4, 5, 6, 7}

<IPython.core.display.Javascript object>

In [72]:
# intersection
s1.intersection(s2)

{5}

<IPython.core.display.Javascript object>

### List comprehension
- generate a list concisely
- without writing explicitly a `for` loop

They take the basic form:
```python
    [expr for val in collection if condition]
```

In [73]:
# Given a list of strings, we could filter out strings with length 3 or less and convert them to uppercase
words = ["as", "car", "demand", "supply", "equalibrium", "market"]

# Using a for loop
words_cap = []
for word in words:
    if len(word) > 3:
        words_cap.append(word.upper())

<IPython.core.display.Javascript object>

In [74]:
words_cap

['DEMAND', 'SUPPLY', 'EQUALIBRIUM', 'MARKET']

<IPython.core.display.Javascript object>

In [75]:
# Using a list comprehension
words_cap2 = [word.upper() for word in words if len(word) > 3]

words_cap2

['DEMAND', 'SUPPLY', 'EQUALIBRIUM', 'MARKET']

<IPython.core.display.Javascript object>

### `map()` funtion
A built-in function that allows you to process and transform all the items in an iterable without using an explicit for loop.

Syntax :
```
map(fun, iter)
```

In [76]:
list(map(len, words))

[2, 3, 6, 6, 11, 6]

<IPython.core.display.Javascript object>

## Section 3. Function

### 3.1 Define a funciton
- Start with `def` keyword
- Returned with the `return` keyword

In [77]:
def Uppercase_words(uuu):
    return [word.upper() for word in uuu if len(word) > 3]


Uppercase_words(words)

['DEMAND', 'SUPPLY', 'EQUALIBRIUM', 'MARKET']

<IPython.core.display.Javascript object>

In [78]:
def Uppercase_words2(element):
    if len(element) > 3:
        return element.upper()


list(map(Uppercase_words2, words))

[None, None, 'DEMAND', 'SUPPLY', 'EQUALIBRIUM', 'MARKET']

<IPython.core.display.Javascript object>

### 3.2 Anonymous (Lambda) Functions
<font color='red'>**Do NOT use `lambda` as variable name.**</font><br>
Python supports anonymous funcions, called lambda functions, which are defined with the `lambda` keyword.<br>
It has no meaning other than "we are declaring an anonymous function".



In [79]:
# Using a def funciton
def double_value(x):
    return x * 2


double_value(3)

6

<IPython.core.display.Javascript object>

In [80]:
# lambda function
equiv_anon = lambda x: x * 2

equiv_anon(3)

6

<IPython.core.display.Javascript object>

#### Exercise 1

Using a list comprehension, create a new list called "newlist" out of the list "numbers". The "newlist" contains the squared value of the element in "numbers". All elements in "newlist" are float type.

In [81]:
numbers = [2, 2, 2, 5, 5, 6]

# insert your code here
newlist = None

newlist

<IPython.core.display.Javascript object>

<details><summary>Click here for the solution</summary>

```python
newlist = [float(number**2) for number in numbers]
```

</details>

#### Exercise 2: using `map` with `lambda` function to generate "newlist"

In [82]:
numbers = [2, 2, 2, 5, 5, 6]

# insert your code here
newlist = None

newlist

<IPython.core.display.Javascript object>

<details><summary>Click here for the solution</summary>

```python
newlist = list(map(lambda x: float(x ** 2), numbers))
```

</details>

## Section 4. Files and the Operating System

### Reading Files

For most of time, we use the high level tools like `pandas.read_csv` to read files. However, it is important to understand the basics of how to work with files in Python.

In [83]:
f_path = "data/thezenofpython.txt"

<IPython.core.display.Javascript object>

In [84]:
infile = open(f_path)

<IPython.core.display.Javascript object>

By default, the file is opened in read-only mode `r`.

In [85]:
print(infile)

<_io.TextIOWrapper name='data/thezenofpython.txt' mode='r' encoding='UTF-8'>


<IPython.core.display.Javascript object>

Python is not printing the contents of the file but only some mysterious mention of some `TextIOWrapper`. This `TextIOWrapper` thing is Python's way of saying it has *opened* a connection to the file `data/thezenofpython.txt`. In order to *read* the contents of the file we must add the function `read` as follows:

In [86]:
print(infile.read())

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


<IPython.core.display.Javascript object>

In [87]:
infile = open(f_path)
text = infile.read()

<IPython.core.display.Javascript object>

The variable `text` now holds the contents of the file `data/thezenofpython.txt` and we can access and manipulate it just like any other string. After we read the contents of a file, the `TextWrapper` no longer needs to be open. In fact, it is good practice to close it explicitly as soon as you do not need it anymore. Closing the file releases its resources back to the operating system:

In [88]:
infile.close()

<IPython.core.display.Javascript object>

In [89]:
# import this

<IPython.core.display.Javascript object>

<img src="images/filemode.png" alt="Drawing" style="width: 520px;"/>

### Writing results to a file

We have already seen how to read a text from our disk. Writing to our disk is only slightly different. The following lines of code write a single sentence to the file `first-output.txt`.

In [90]:
f_path2 = "data/my_output.txt"
outfile = open(f_path2, mode="w")
outfile.write("My first output.")
outfile.close()

<IPython.core.display.Javascript object>

Here the mode says `w`, meaning "open the file for writing". Go ahead and open the file `my_output.txt` located in the folder where this course resides. As you can see it contains the line `My first output.`. 

### Some useful commands in [`os` module](https://docs.python.org/3/library/os.html)

In [91]:
import os

<IPython.core.display.Javascript object>

In [92]:
# Get current working directory
# os.getcwd()
%pwd

'/Users/kairongchen/Google Drive/2021Summer_Python_for_Economists'

<IPython.core.display.Javascript object>

In [93]:
# Return a list containing the names of the entries in the directory
os.listdir("data")

['fundq_sample.csv',
 'fomc0621.csv',
 'my_output.txt',
 '.DS_Store',
 'pd4firm.csv',
 'tips.csv',
 'newfomc0621.csv',
 'fundq_sample.pkl',
 'thezenofpython.txt',
 'company_sample.pkl']

<IPython.core.display.Javascript object>

In [94]:
# Changing the CWD
os.chdir("../")

<IPython.core.display.Javascript object>

In [95]:
os.getcwd()

'/Users/kairongchen/Google Drive'

<IPython.core.display.Javascript object>