# Python Basics

 


This section introduces some of the most relevant aspects of working with Python for social scientists. This includes the different data types available and ways to modify them. 

According to Wikipedia, 

> Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.


To store that in Python, create a new variable called `sentence`

In [1]:
sentence =  'Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.'

The text is surrounded by a single quote (`'`) on each side. 
To make sure that you typed the tweet correctly, you can type `sentence`:

In [2]:
sentence

'Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.'

You can get almost the same response using the `print` function:



In [3]:
print(sentence)

Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.


The only difference is that the first response was wrapped in single quotes and the second wasn’t. As a side note, the single quotes weren’t because you put them there. If you used double quotes, Python would still show a single-quote.

In [4]:
sentence =  "Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes."

sentence

'Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.'

In addition to `'` and `"`, strings can also be marked with a `'''`. This last one is particularly useful when your text contains contractions or quotation marks. 

In [5]:
new_sentence = '''According to Wikipedia, "Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes."'''

In [6]:
print(new_sentence)

According to Wikipedia, "Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes."


<div class="alert alert-info">
<h3> Your turn</h3>
<p> Create a new string called <code>food</code>  that is a sentence about your most recent meal. Display the contents of your new string. 
</div>

#### Strings

Python has a few tools for manipulating text, such as `lower` for making the string lower-case.

In [7]:
sentence.lower()

'caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.'

This did not alter the original string, however.

In [8]:
sentence

'Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.'

In Python, strings are immmutable, meaning once created, they can not be altered in place. We could store the results in a new variable.

In [9]:
lower_sentence = sentence.lower()

lower_sentence

'caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.'

<div class="alert alert-info">
<h3> Your turn</h3>
<p> Create a new, lower cased version of your <code>food</code> string.
</div>

We can also `replace` text within the string.

In [10]:
sentence.replace("Caribbean reef squid", "Velociraptors")

'Velociraptors have been shown to communicate using a variety of color, shape, and texture changes.'

`replace` can also be used to remove text without replacement.

In [11]:
sentence.replace(".", "")

'Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes'

As before, this does not alter the original string. If you wanted to save the string edits, you would need to create a new variable.






In [12]:
edited_sentence = sentence.lower()
print(edited_sentence)

caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.


If you were doing a series of manipulations, you could reuse a varaiable name, although it is best practices to keep a version of the original string in case you ever need to go back to it. 

In [13]:
edited_sentence = sentence.lower()
print(edited_sentence)

edited_sentence = edited_sentence.replace(".", "")
print(edited_sentence)

caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.
caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes


You can also stack multiple transformations together, although combining too many may make your code harder to follow.

In [14]:
edited_sentence.replace(".", "").lower()

'caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes'

<div class="alert alert-info">
<h3> Your turn</h3>
<p> Create a new string called <code>boring</code> that removes the exclamation marks and capitalization from the sentence "Way to go!!!".  
</div>

### Slicing

If you had a very long text, such as the entire text of a Wikipedia article, you might only want to look at the first few characters. In Python, this is called by slicing.

In [17]:
sentence

'Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.'

In [18]:
sentence[0:20]

'Caribbean reef squid'

A slice is signaled with brackets (`[]`). The first number is the starting position, where 0 indicates the beginning. This is followed by a colon (`:`) and then the end position, which, in this case, is a 20. Note that this is splitting on characters, not words.

Here is a section from the middle of the string:

In [19]:
sentence[20:32]

' have been s'

For convience, if you ommit the number before the colon, it defaults to the string beginning.

In [20]:
sentence[:40]

'Caribbean reef squid have been shown to '

Ommitting the second number defaults to the end.

In [21]:
sentence[40:]

'communicate using a variety of color, shape, and texture changes.'

Finally, negative numbers are interpreted as distance from the end of the string.

In [22]:
sentence[-20:]

'and texture changes.'

<div class="alert alert-info">
<h3> Your turn</h3>
<p> Create a new string called <code>s</code> that contains <code>The weather is hot and humid today.</code> Find the slices for each of the following :
<ul>
    <li> <code>The we</code> </item>
    <li> <code>today.</code>
    <li> <code>hot and humid</code>
</ul>

</div>

#### Numbers

We can also count the number of characters in a string with the `len` function.

In [23]:
len(sentence)

105

In this case, Python returned an interger instead of string. This also can be stored in a variable.

In [24]:
sentence_length = len(sentence)

In [25]:
sentence_length

105

<div class="alert alert-info">
<h3> Your turn</h3>
<p> What is the length of <code>How many dogs do you own?</code>? Store it in a variable called <code>sl</code>.

</div>

Since this is a number, we can do standard math operations with it.

In [26]:
print(sentence_length * 3)

315


In [27]:
print(sentence_length / 2)

52.5


In [28]:
print(sentence_length + sentence_length)

210


<div class="alert alert-info">
<h3> Your turn</h3>
<p> What is one-third the length of <code>sl</code>.

</div>

As with strings, these can be saved in new variables.

In [29]:
double_length = sentence_length + sentence_length

print(double_length)

210


These same operators also work with strings.

In [30]:
print(sentence * 2)

Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.


In [31]:
print(sentence + sentence)

Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.Caribbean reef squid have been shown to communicate using a variety of color, shape, and texture changes.


The operators can't be used to combine different data types, however.

In [32]:
print("The sentence was " + sentence_length + "characters.")

TypeError: can only concatenate str (not "int") to str

Conviently, Python the `str` function will convert an interger to a string.

In [33]:
print("The sentence was " + str(sentence_length) + " characters.")

The sentence was 105 characters.


I manually had to include the spaces before and after `sentence_length`. Otherwise, it all is smushed together. 

In [34]:
print("The sentence was" + str(sentence_length) + "characters.")

The sentence was105characters.


<div class="alert alert-info">
<h3> Your turn</h3>
<p>Print <code>The length of the word "hippopotamus" is [x].</code> where <code>[x]</code> is the length of the word hippopotamus  .

</div>

#### Lists

We can also `split` the sentence into a series of strings. By default, this splits based on spaces and other whitespace characters such as a line break (`\n`) or tab character (`\t`). 

In [35]:
print(sentence.split())

['Caribbean', 'reef', 'squid', 'have', 'been', 'shown', 'to', 'communicate', 'using', 'a', 'variety', 'of', 'color,', 'shape,', 'and', 'texture', 'changes.']


What is returned here is a third data type (the first two were strings and intergers) called a list. A list is enclosed in brackets (`[]`) and the items are seperated by commas. In this case each item is in quotation marks because they are all strings. Items in a list, however, can be of any sort.

In [36]:
my_list = ['Speeches', 7, 'Data']
my_list

['Speeches', 7, 'Data']

While `len` returned the number of characters in a string, it returns the number the items in a list.

In [37]:
len(my_list)

3

In [38]:
sentence_length = len(sentence.split())
sentence_length

17

In the second example, the list created by `sentence.split()` is not saved in any way; only its length.


<div class="alert alert-info">
<h3> Your turn</h3>
<p> Create a list called <code>food</code> that includes at least three things you ate today. Use <code>len</code> to count the number of items in the list.

</div>



Like, strings, lists can also be sliced. The first three items of a list:

In [39]:
words = sentence.split()
print(words[:3])

['Caribbean', 'reef', 'squid']


We can also extract specific items from a list by their position. As it did with strings, slicing in Python starts with 0.

In [40]:
words[0]

'Caribbean'

The third word:

In [41]:
words[3]

'have'

The fifth word from the end:

In [42]:
words[-5]

'color,'

The last two words, returned as a list:

In [44]:
words[-2:]

['texture', 'changes.']


<div class="alert alert-info">
<h3> Your turn</h3>
<p>Display the first two items of your <code>food</code> list. What is the last item?

</div>



Unlike a string, lists are mutable. That means that we can remove or as is more frequently the case text analysis, add things to it. This is done with `append`.

In [45]:
male_words = ['his', 'him', 'father']
male_words.append('brother')
print(male_words)

['his', 'him', 'father', 'brother']


Since `append` is changing `male_words`, we do not want to use an `=`. The Python interpreter is editing our original list but not returning anything.

In [46]:
not_going_to_work = male_words.append('brother')
print(not_going_to_work)

None


Lists can be also be combined using `+`.

In [47]:
gendered_words = male_words + ['her', 'she', 'mother']
print(gendered_words)

['his', 'him', 'father', 'brother', 'brother', 'her', 'she', 'mother']


As note above, the items in a list can include a variety of data types. This includes lists.

In [48]:
gendered_lists = [ male_words ,  ['her', 'she', 'mother'] ]

Note the two closing brackets next to each other. The first closes the list that ends with 'mother' while the second closes our `gendered_lists`.

In [49]:
len(gendered_lists)

2

`gendered_lists` has a length of two because it contains just two items, each a list of varying lengths.

In [50]:
print(gendered_lists)

[['his', 'him', 'father', 'brother', 'brother'], ['her', 'she', 'mother']]



<div class="alert alert-info">
<h3> Your turn</h3>
<p> Add three more items to your <code>food</code>list. Use <code>append</code> for one. 
For the other two, places them in a list and then combine the two lists. 

</div>



#### Dictionaries

A fourth useful data type is a dictionary. A dictionary is like a list in that it holds multiples items. The items in a list can be identified by their position in the list. In contrast, the values in a dictionary are associated with a keyword. The analogy here is a to a physical dictionary, which has a list of unique words, and each word has a definition. In this case, the entries are called keys, and the definitions, which can be any data type, are called values. 

Alternatively, you can think of a dictionary as a single row of data from a dataset, where the keys are the variable names.

In [51]:
respondent = {'sex'   : "female",
              'abany' : 1,
              'educ'  : 'College'}

Dictionaries are surrounded by curly brackets (`{}`). Each entry is pair consisting of the key, which must be a string, followed, by a colon and then the value. Like in a list, entries are seperated by commas.



We can access the contents of a dictionary by enclosing the key in brackets (`[]`).

In [52]:
respondent['sex']

'female'

If the key is not dictionary, you will get a `KeyError`.

In [53]:
respondent['gender']

KeyError: 'gender'

You can inspect all the keys in a dictionary, in case you forgot or someone else made it.

In [54]:
respondent.keys()

dict_keys(['sex', 'abany', 'educ'])

In [55]:
len(respondent.keys())

3

Dictionaries are mutable, so we can change the value of existing keys, remove keys, or add new ones.

In [56]:
respondent['race'] = 'Black'

print(respondent)

{'sex': 'female', 'abany': 1, 'educ': 'College', 'race': 'Black'}


In [57]:
respondent['abany'] = 'Yes'

print(respondent)

{'sex': 'female', 'abany': 'Yes', 'educ': 'College', 'race': 'Black'}


<div class="alert alert-info">
<h3> Your turn</h3>
<p>Add a new key to the dictionary called <code>age</code> with a value of 37. Confirm that you did it correctly by dispaying the value of <code>age</code>.

</div>


As noted above, while the keys have to be strings, the values can be any data type.

In [58]:
respondent['children ages'] = [3, 5, 10]

print(respondent)

{'sex': 'female', 'abany': 'Yes', 'educ': 'College', 'race': 'Black', 'children ages': [3, 5, 10]}


#### Spaces


Within the Python community, there are strong norms about how code should be written. Many of these are centered around have code be readable, both by others and by your future self. As a trivial example, `2+2` is allowed, by is almost always written `2 + 2`. Likewise I defined my respondent dictionary with plenty of white space in order to maximize readability.

In [59]:
respondent = {'sex'   : "female",
              'abany' : 1,
              'educ'  : 'College'}

This is identical to:

In [60]:
respondent={'sex':'female','abany':1,'educ':'College'}




but putting it all on one line obscures the logic of the dictionary. In this case, what is a key and what is a value is quite clear in the first version, while distinguishing between the two is more problematic in the single-line version. 



In [61]:
r2 = {'sex':'male',   'abany':1, 'educ':'College'     }
r3 = {'sex':'female', 'abany':0, 'educ':'High School' }
r4 = {'sex':'male',   'abany':0, 'educ':'Some College'}

In [62]:
respondents = [respondent, r2, r3, r4]

In [63]:
respondents

[{'sex': 'female', 'abany': 1, 'educ': 'College'},
 {'sex': 'male', 'abany': 1, 'educ': 'College'},
 {'sex': 'female', 'abany': 0, 'educ': 'High School'},
 {'sex': 'male', 'abany': 0, 'educ': 'Some College'}]

This is now looks a lot like the common data format JSON!

### Loops

In [64]:
for person in respondents:
    print(person['educ'])

College
College
High School
Some College


In [65]:
for item in [1,2,'bobcat']:
    print(item)

1
2
bobcat



<div class="alert alert-info">
<h3> Your turn</h3>
<p> Loop over the items in your <code>food</code> list. For each item, print its length.

</div>





### Functions

For those who come from Stata or R background, one of the more striking aspects of Python code is the frequency of user defined functions. They are deployed not just for things where you think there should be a function, like counting words in a sentence, but also for highly-custom situations, such as scraping the contents of a particular web page. This style of programming, with many small functions, tends to make code more readable easier to debug than code written in a more traditional social science style.

A standard function has three parts. First the function is named and defined. Subsequent line or lines actually do the thing. Finally the results are returned.  

A trivial function that returns the `Hello!` might look like:

In [66]:
def make_hello():
    word = 'Hello!'
    return word

Of note, `def` signals that your are defining a function. This is followed by the name of the function. In this case, `make_hello`. Since this function doesn't take any arguments, such as accepting a variable to modify or have any options, it is followed by `()`. The first line ends with a colon.

All subsequent lines are indented. The second line creates a new string variable called `word` which contains `Hello!`. The third and final line of the functions returns the value stored in word. 

In [67]:
make_hello()

'Hello!'

More commonly in text analysis, a user-defined function modifies an existing string. In this case, the variable name that will be used within the function is established within the parenthesis on the opening line. 

A second trivial function takes a text string and returns an all-caps version. 

In [1]:
def scream(text):
    text_upper = text.upper()
    return text_upper

In [2]:
scream('Hi there!')

'HI THERE!'

![](https://raw.githubusercontent.com/nealcaren/UiOBigData/master/notebooks/images/function.png)

The `text` and `text_upper` variable only exist within the function. That means that you can pass a variable not called `text` to the function.

In [70]:
scream(sentence)

'CARIBBEAN REEF SQUID HAVE BEEN SHOWN TO COMMUNICATE USING A VARIETY OF COLOR, SHAPE, AND TEXTURE CHANGES.'

It also means that everything but the returned `text` disappears.

In [71]:
text_upper

NameError: name 'text_upper' is not defined

It is a good idea to include a comment within the function that explains the function. This is helpful for other people reading your code and when you return to your own code months and days later.

In [72]:
def scream(text):
    '''Returns an all-caps version of text string.'''
    text_upper = text.upper()
    return text_upper

<div class="alert alert-info">
<h3> Your turn</h3>
<p> Make a function call "whisper" that replaces all exclamation marks with a period and returns a lower case version of a string. Test it out.
</div>

In [73]:
def whisper(text):
    ''''''
    
    return quite_text

<div class="alert alert-info">
<h3> Your turn</h3>
<p> What did we learn today? Make a list of all the command you encountered.
</div>