### Introduction to Python Basics

Adapted from Melanie Walsh's "Python" module in [*Introduction to Cultural Analytics and Python* (2020)](https://melaniewalsh.github.io/Intro-Cultural-Analytics/Python/Python.html)

This introductory tutorial is broken up into 6 short parts, which are linked below:

- [Python and Jupyter Notebook Tips](#Python-and-juypyter-notebook-tips)
- [1. Anatomy of a Python Script](#1.-Anatomy-of-a-Python-Script)
- [2. Variables](#2.-Variables)
- [3. Data Types](#3.-Data-Types)
- [4. String Methods](#4.-String-Methods)
- [5. Conditionals and Comparisons](#5.-Conditionals-and-Comparisons)
- [6. Lists and Loops](#6.-List-and-Loops)


## Python and Jupyter Notebook Tips:

This notebook is a Jupyter Notebook. You can interact with it in a few ways: 

1. You can click on the Binder version (this is hosted on a cloud server)
2. You can run on your own machine through Jupyter Labs: 
	- Download this notebook and this folder of data.
	- I encourage you to learn to create Jupyter notebooks on your own machine––this will give you a little more control over writing and saving your own Python code

###  Pro Tips:
- Running a cell in JupyterLab: Click on the cell, then click ► (the "Run" icon) in the menu at the top of this notbook 
- `Tab` completion. 
    - Like the command line, Python uses tab completion
    - Pressing the `tab` key on your on your keyboard will allow you to search for any variables that you've already defined, as well as matching functions or modules within python.
- Run cells in order!
    - Python executes code in the order that it's written. This means that some parts of code will depend on parts written earlier. If you get an error, it may mean that you simply haven't defined a variable or function. Make sure to run code in the sequence it's written.


------


## 1. Anatomy of a Python Script

Below is a chunk of Python code. These lines, when put together, do something simple yet important. They count and display the most frequent words in a text file. (If basic forms of text analysis like countint word frequency sounds familiar, this is because what we'll be learning to do with Python builds on tbe kinds of commands you learned to do with the command line.)

The example below specifically counts and displays the 40 most frequent words in Franklin Delanor Roosevelt's 1933 inaugural address:

In [None]:
# word-frequencies.py

# Import Libraries and Modules

import re
from collections import Counter

# Define Functions

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

# Define Filepaths and Assign Variables

filepath_of_text = "US_Inaugural_Addresses/37_roosevelt_franklin_1933.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

# Read in File

full_text = open(filepath_of_text, encoding="utf-8").read()

# Manipulate and Analyze File

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

# Output Results

most_frequent_meaningful_words

This might look intimidating, but we can reverse engineer about the differen parts of Python script. 

Most Python scripts have the same basic anatomy:

### Import Libraries and Modules

Let's start with those first lines in our script:


In [1]:
import re
from collections import Counter

We call the code written and packaged up by other people a "library," "package," or "module." We'll talk more about them in a later lesson. For now simply know that you import libraries/packages/modules at the very top of a Python script for later use.

- `Counter` will help us count words
- `re`, short for regular expressions, is basically a fancy find-and-replace that will help us split Roosevelt's 1933 address into individual words and get rid of trailing punctuation



### Define Functions

After importing any libraries with pre-defined functions that we might be useing, we define our own functions. Below, the example is a function designed to split words and turn all words lower case

In [2]:
def split_into_words(any_chunk_of_text):
    words = re.split("\W+", any_chunk_of_text.lower())
    return words 



Here we're making a function called `split_into_words`, which takes in any chunk of text, transforms that text to lower-case, and splits the text into a list of clean words without punctuation or spaces. We're not actually using the function yet.


### Define Filepaths and Assign Variables

Next, we define filepaths and assign variables that we'll use later in our script:

In [3]:
filepath_of_text = "US_Inaugural_Addresses/37_roosevelt_franklin_1933.txt"
number_of_desired_words = 40
stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

What did we just do?

We created a variable called `filepath_of_text` and assigned it to the filepath the text that we'll be analyzing ––Roosevel's 1933 address––so that we can read it in.

We set the `number_of_desired_words` to 40 so we can look at the top words.

And we made a variable called `stopwords` and plug in a list of common English language "stop words"—that is, a list of some of the most frequently occurring English language words. Stop words are typically removed from a text before computational analysis in order to shift the focus to less frequently occurring, more "meaningful" words.

> Questions for reflection:  
> 1. What are the consequences of excluding stopwords from our analysis?  
> 2. Look back at our list of stopwords. When might we want to include such words in analyzing a document like a presidential speech?

### Ok, now what can we do with these functions and variables?

### Read in a File

Here's a quick script that reads in our file (which we set to Roosevelt's 1933 address) and assigns it to the variable `full_text`:

In [4]:
full_text = open(filepath_of_text, encoding="utf-8").read()

### Manipulate and Analyze a File
To count the words in our speech, we'll need to break the text into words. This is where our `split_into_words` function comes in handy. Below we call the function `split_into_words`, which we created earlier, and use it to split the `full_text` of the story into individual words. Then we assign this value to the variable `all_the_words`.

In [5]:
all_the_words = split_into_words(full_text)

We can then remove stopwords. Below we use a `for` loop (more on this in a later lesson!) to cylce through our `all_the_words` and remove stopwords from our list and assign that to a new variable:

In [6]:
meaningful_words = [word for word in all_the_words if word not in stopwords] # remove the stopwords from full_text

Now we can count!

Here we plug our new list of words, `meaningful_words`, into our Counter, which gives us a tally of how many times each word in the speech it appears.

In [7]:
meaningful_words_tally = Counter(meaningful_words) # Here we're using a function from the Counter library we imported earlier


### Output Results

Lastly, we pull out the top 40 most frequently occurring words from our complete tally. We make one final variable and grab our top number_of_desired_words, which we previously established as 40.


In [8]:
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

Now we display the most frequent meanigful words:

In [9]:
most_frequent_meaningful_words 

[('national', 9),
 ('people', 8),
 ('may', 8),
 ('must', 8),
 ('leadership', 7),
 ('helped', 7),
 ('shall', 7),
 ('nation', 6),
 ('us', 6),
 ('action', 6),
 ('world', 6),
 ('time', 5),
 ('money', 5),
 ('great', 4),
 ('first', 4),
 ('efforts', 4),
 ('every', 4),
 ('days', 4),
 ('face', 4),
 ('values', 4),
 ('public', 4),
 ('hand', 4),
 ('task', 4),
 ('emergency', 4),
 ('old', 4),
 ('upon', 4),
 ('congress', 4),
 ('measures', 4),
 ('respects', 4),
 ('discipline', 4),
 ('duty', 4),
 ('need', 3),
 ('dark', 3),
 ('essential', 3),
 ('spirit', 3),
 ('common', 3),
 ('government', 3),
 ('trade', 3),
 ('important', 3),
 ('return', 3)]

Or we could use the `print` function to print our most frequent words:

In [10]:
print(most_frequent_meaningful_words)

[('national', 9), ('people', 8), ('may', 8), ('must', 8), ('leadership', 7), ('helped', 7), ('shall', 7), ('nation', 6), ('us', 6), ('action', 6), ('world', 6), ('time', 5), ('money', 5), ('great', 4), ('first', 4), ('efforts', 4), ('every', 4), ('days', 4), ('face', 4), ('values', 4), ('public', 4), ('hand', 4), ('task', 4), ('emergency', 4), ('old', 4), ('upon', 4), ('congress', 4), ('measures', 4), ('respects', 4), ('discipline', 4), ('duty', 4), ('need', 3), ('dark', 3), ('essential', 3), ('spirit', 3), ('common', 3), ('government', 3), ('trade', 3), ('important', 3), ('return', 3)]


We can also output our results to a text file:

In [11]:
with open("most-frequent-words-37_roosevelt_franklin_1933.txt", "w") as file_object: # This code defines a new file to open and write to, then describes what to write to it
    file_object.write(str(most_frequent_meaningful_words))

### Add comments

Lines that begin with a hash symbol `#` are ignored from the execution of the code. You can thus use a hash symbol # to insert human language comments directly into the code — notes or instructions to yourself and others.

It's good practice to add lines of comments on their own, or at the end of a line of code explaining what a step does. See the comment in the cell above explaining the code to write a new file object.

In some cases, you might want to write a long comment. To insert a multi-line comment, you can insert the comment between three quotations marks `""" """`. Or you can 
 

In [12]:
# This line is a comment

# This line is also a comment 

# Hashtags can be useful to describe what a line of code is doing
# Don't assume that what you're doing will be legible to others ––or to your future self! 

-----

## 2. Variables

These are crucial building blocks of any Python code.

Variables are where you store filenames, words, numbers, lists of words and numbers.

Variables are "assigned" values with the = sign.

*NOTE*: In Python, `=` is what allows us to assign variables, while `==` (a double equals sign) is the real equals sign.

In [13]:
new_variable = 42
print(new_variable)

42


In [14]:
another_variable = "I'm another variable!"
print(another_variable)

I'm another variable!


Let's look at the word frequencies script we wrote again:

In [15]:
# word-frequencies.py

# Import Libraries and Modules

import re
from collections import Counter

# Define Functions

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

# Define Filepaths and Assign Variables

filepath_of_text = "US_Inaugural_Addresses/37_roosevelt_franklin_1933.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

# Read in File

full_text = open(filepath_of_text, encoding="utf-8").read()

# Manipulate and Analyze File

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

# Output Results

most_frequent_meaningful_words

[('national', 9),
 ('people', 8),
 ('may', 8),
 ('must', 8),
 ('leadership', 7),
 ('helped', 7),
 ('shall', 7),
 ('nation', 6),
 ('us', 6),
 ('action', 6),
 ('world', 6),
 ('time', 5),
 ('money', 5),
 ('great', 4),
 ('first', 4),
 ('efforts', 4),
 ('every', 4),
 ('days', 4),
 ('face', 4),
 ('values', 4),
 ('public', 4),
 ('hand', 4),
 ('task', 4),
 ('emergency', 4),
 ('old', 4),
 ('upon', 4),
 ('congress', 4),
 ('measures', 4),
 ('respects', 4),
 ('discipline', 4),
 ('duty', 4),
 ('need', 3),
 ('dark', 3),
 ('essential', 3),
 ('spirit', 3),
 ('common', 3),
 ('government', 3),
 ('trade', 3),
 ('important', 3),
 ('return', 3)]

In the above, our variables are:

- `filepath_of_text`
- `stopwords`
- `number_of_desired_words`
- `full_text`


Variable names can be as long or as short as you want, and they can include:

- upper and lower-case letters (A-Z)

- digits (0-9)

- underscores (_)

However, variable names cannot include:

- other punctuation (-.!?@)

- spaces ( )

- a reserved Python word (like `print` or `True`)


### Jupyter Display

We can use the Jupyter display function by simply running a cell with a variable's name. This is a special function in Jupyter that allows us to display the contents of a variable.

In [16]:
filepath_of_text

'US_Inaugural_Addresses/37_roosevelt_franklin_1933.txt'

### Your turn! 


We can also modify our script! This next exercise will show you how we can use the modularity of Python code to change what text we're analyzing and the variable settings,  

## Exercise 1a:

PIckk now it's your turn to change some variables and calculate a new word frequency! First, pick a new text file from one of our US_Inaugural_Addresses. 

Then, assign `filepath_of_text` to the corresponding filepath. (Remeber that you must include the path to the folder enclosing it!

Try to change the `number_of_desired_words` as well:

In [None]:
import re
from collections import Counter

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = #Insert a New Text File Here (remeber to include the enclosing folder, US_Inaugural_Addresses)
number_of_desired_words = #Change number of desired words

#Explore how the stopwords below affect word frequency by adding or removing stopwords
stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

most_frequent_meaningful_words

## Exercise 1b:

Let's try and spend a little more time thinking about those stopwords we've been removing from our text. 

Using our code above, and one of the other inaugural addresses you've been looking at, try to remove ALL of the stopwords. How does this change our list of `most_frequent_meaningful_words`? What might we have missed in aassuming that the stopwords were not "meaningful" words?

Write your reflection below:

**=> Double Click HERE to type in a text box**

----

## 3. Data Types


There are four basic types of data in Python, each of which can be used to do different things: 

- Strings (Text), eg: "37_roosevelt_franklin_1933.txt"
- Integers (Whole Numbers), eg `40`
- Floats (Decimal Numbers), eg `40.3`
- Booleans (True/False), eg `False`


You can check a data type using `type`. Try checking the data type by running the cells below.

In [17]:
type("yellow")

str

In [18]:
type(filepath_of_text)

str

In [19]:
type(40)

int

In [20]:
type(number_of_desired_words)

int

In [21]:
type(40.3)

float

In [22]:
type(False)

bool

### Strings


A *string* is a Python data type that is treated like text, even if it contains a number. Strings are always enclosed by either single quotation marks 'this is a string' or double quotation marks "this is a string". 

There are special things that you can do with strings. You can *index


In [23]:
"this is also a string, even though it contains a number like 42"

'this is also a string, even though it contains a number like 42'

#### String Methods

In [24]:
lemonade_snippet = "Hold up, they don't love you like I love you"

##### Index

In [25]:
lemonade_snippet[0] # gives us a character, given an index position 

'H'

The above gives us the character starting at the first index position, 0.

NOTE: in Python, indexes start at 0 rather than 1

##### Slice
We can use slice to get the first 20 characters:

In [26]:
lemonade_snippet[0:20]

"Hold up, they don't "

##### Add
We can add another string to our string, `lemonade_snippet`

In [27]:
lemonade_snippet + " // Slow down, they don't love you like I love you"

"Hold up, they don't love you like I love you // Slow down, they don't love you like I love you"

##### Make uppercase

In [28]:
lemonade_snippet.upper()

"HOLD UP, THEY DON'T LOVE YOU LIKE I LOVE YOU"

#### f-Strings
 An f-string, short for formatted string literal, allows you to insert a variable directly into a string.
 
 An f-string must begin with an f outside the quotation marks. Then, inside the quotation marks, the inserted variable must be placed within curly brackets {}.

In [29]:
print(f"Beyonce burst out of the building and sang: \n\n'{lemonade_snippet}'")

Beyonce burst out of the building and sang: 

'Hold up, they don't love you like I love you'


What does `\n` mean in the statement above?
`\n` is a symbol for "new line".

### Integers & Floats


An integer and a float (short for floating point number) are two Python data types for representing numbers. Integers represent whole numbers. Floats represent numbers with decimal points. They do not need to be placed in quotation marks.


In [30]:
type(40)

int

In [31]:
type(40.5)

float

We can do mathematical calculations with both integers and floats.

Multiplication:

In [32]:
variable1 = 4
variable2 = 2
variable1 * variable2

8

Exponents:

In [33]:
variable1 ** variable2

16

Remainder:

In [34]:
72 % 10

2

### Booleans
Booleans are a data type for logical "true/false" statemetns. They report on whether or not   things are Tue or False, based on variables you've assigned.

For example, we can assign a variable called `beyonce` the value "Grammy award-winner"

In [35]:
beyonce = "Grammy award-winner"

Now we can test whether the variable `beyonce` equals "Grammy award-winner"

In [36]:
beyonce == "Grammy award-winner" # Notice: here we're using the == sign, the "real" equals sign in PYthon

True

If we then evaluate whether Beoynce is an award winner (using the == double equals sign to evaluate our bBoelan)

In [37]:
beyonce == "Oscar award-winner"

False

### Your turn!

## Exercise 2: 
Let's try practicing playing with data types!

Run the following cells:

In [38]:
name = 'Prof. Eckert' #string
age = 1000 #integer
place = 'New York' #string 
favorite_food = 'cheese' #string
dog_years_age = age * 7.5 #float
student = False #boolean

In [39]:
print(f'✨This is...{name}!✨')

print(f"""{name} likes {favorite_food} and once lived in {place}.
{name} is {age} years old, which is {dog_years_age} in dog years.
The statement '{name} is a student' is {student}.""")

✨This is...Prof. Eckert!✨
Prof. Eckert likes cheese and once lived in New York.
Prof. Eckert is 1000 years old, which is 7500.0 in dog years.
The statement 'Prof. Eckert is a student' is False.


In [40]:
print(f"""
name = {type(name)}
age = {type(age)}
place = {type(place)}
favorite_food = {type(favorite_food)}
dog_years_age = {type(dog_years_age)}
student = {type(student)}
""")


name = <class 'str'>
age = <class 'int'>
place = <class 'str'>
favorite_food = <class 'str'>
dog_years_age = <class 'float'>
student = <class 'bool'>



In [None]:
name = #Your code here
age =  #Your code here
home_town =  #Your code here
favorite_food =  #Your code here
dog_years_age = age * 7.5 #Your code here * 7.5
student = False #boolean
favorite_movie = 

In [None]:
print(f'✨This is...{name}!✨')

print(f"""{name} likes {favorite_food} and once lived on {home_town}.
{name} is {age} years old, which is {dog_years_age} in dog years.
The statement "{name} is a student" is {student}.
She loves {favorite_movie}.""")

----

## 4. String Methods

### Practicing more with strings

We're going to practice using Franz Kafka's 1915 novela *The Metamorphosis.

#### Read a text file
File objects are not readable. In order to make it readable, we need to first `open` it and  use the `.read()` method to make it readable

In [41]:
sample_text = open("Kafka-The-Metamorphosis.txt", encoding="utf-8").read() # we need to both open and read tehe file
# Note that the read() operation is important!
# Notice the "utf-8" encoding

In [42]:
print(sample_text)

One morning, when Gregor Samsa woke from troubled dreams, he found
himself transformed in his bed into a horrible vermin.  He lay on
his armour-like back, and if he lifted his head a little he could
see his brown belly, slightly domed and divided by arches into stiff
sections.  The bedding was hardly able to cover it and seemed ready
to slide off any moment.  His many legs, pitifully thin compared
with the size of the rest of him, waved about helplessly as he
looked.

"What's happened to me?" he thought.  It wasn't a dream.  His room,
a proper human room although a little too small, lay peacefully
between its four familiar walls.  A collection of textile samples
lay spread out on the table - Samsa was a travelling salesman - and
above it there hung a picture that he had recently cut out of an
illustrated magazine and housed in a nice, gilded frame.  It showed
a lady fitted out with a fur hat and fur boa who sat upright,
raising a heavy fur muff that covered the whole of her lower arm
t

### Extract parts of Strings

We can use the index function and square brackets [] to index aprt of the strings.

In [43]:
sample_text[0]

'O'

In [44]:
sample_text[1]

'n'

In [45]:
sample_text[2]

'e'

### Slicing:

Remember, we can slice a string up to or between certain characters or by certan increments.

`string[start:stop:step]`

In [46]:
sample_text[:121]

'One morning, when Gregor Samsa woke from troubled dreams, he found\nhimself transformed in his bed into a horrible vermin.'

In [47]:
sample_text[0:121]

'One morning, when Gregor Samsa woke from troubled dreams, he found\nhimself transformed in his bed into a horrible vermin.'

In [48]:
sample_text[121:250]

'  He lay on\nhis armour-like back, and if he lifted his head a little he could\nsee his brown belly, slightly domed and divided by '

Let’s create a variable first_line and assign it the first sentence of *The Metamorphosis*.

In [49]:
first_line = sample_text[:121]
print(first_line)

One morning, when Gregor Samsa woke from troubled dreams, he found
himself transformed in his bed into a horrible vermin.


### String Methods

`string.lower()`  makes the string lowercase

`string.upper()`  makes the string uppercase

`string.title()` makes the string titlecase

`string.strip()` removes lead and trailing white spaces

`string.replace('old string', 'new string')` replaces old string with new string

`string.split('delim')` returns a list of substrings separated by the given delimiter

`string.join(list)` opposite of split(), joins the elements in the given list together using the string

`string.startswith('some string')` tests whether string begins with some string

`string.endswith('some string')` tests whether string ends with some string

`string.isspace()` tests whether string is a space

### Replace words

We can replace words with a string:

`string.replace('old string', 'new string')` : replace one string with another

In [50]:
print(first_line.replace("morning", "evening"))

One evening, when Gregor Samsa woke from troubled dreams, he found
himself transformed in his bed into a horrible vermin.


In [51]:
print(first_line.replace("vermin", "grilled cheese"))

One morning, when Gregor Samsa woke from troubled dreams, he found
himself transformed in his bed into a horrible grilled cheese.


### Transform Strings to Lowercase/Uppercase

In [52]:
("I am really very quiet").lower()

'i am really very quiet'

In [53]:
("I am really very quiet").upper()

'I AM REALLY VERY QUIET'

### Your turn! 

## Exercise 3: 

Transform the first line of Kafka's novella to uppercase:

In [None]:
# Your code here

### Split Strings By a Delimiter

In [54]:
first_line.split()

['One',
 'morning,',
 'when',
 'Gregor',
 'Samsa',
 'woke',
 'from',
 'troubled',
 'dreams,',
 'he',
 'found',
 'himself',
 'transformed',
 'in',
 'his',
 'bed',
 'into',
 'a',
 'horrible',
 'vermin.']

The default for split is spaces. But we can split on other things

In [55]:
first_line.split("dreams")

['One morning, when Gregor Samsa woke from troubled ',
 ', he found\nhimself transformed in his bed into a horrible vermin.']

### Join Strings By a Delimiter


In [56]:
kafka_split_words = first_line.split()
kafka_split_words

['One',
 'morning,',
 'when',
 'Gregor',
 'Samsa',
 'woke',
 'from',
 'troubled',
 'dreams,',
 'he',
 'found',
 'himself',
 'transformed',
 'in',
 'his',
 'bed',
 'into',
 'a',
 'horrible',
 'vermin.']

In [57]:
"SPACE".join(kafka_split_words)

'OneSPACEmorning,SPACEwhenSPACEGregorSPACESamsaSPACEwokeSPACEfromSPACEtroubledSPACEdreams,SPACEheSPACEfoundSPACEhimselfSPACEtransformedSPACEinSPACEhisSPACEbedSPACEintoSPACEaSPACEhorribleSPACEvermin.'

In [58]:
" ".join(kafka_split_words)

'One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed in his bed into a horrible vermin.'

In [59]:
"pizza".join(kafka_split_words)

'Onepizzamorning,pizzawhenpizzaGregorpizzaSamsapizzawokepizzafrompizzatroubledpizzadreams,pizzahepizzafoundpizzahimselfpizzatransformedpizzainpizzahispizzabedpizzaintopizzaapizzahorriblepizzavermin.'

### Files & Character Encoding

To read, write, or manipulate a text file, you must `open` it first. This can be accomplished with the `open()` function.

In [60]:
open('Kafka-The-Metamorphosis.txt', encoding='utf-8')

<_io.TextIOWrapper name='Kafka-The-Metamorphosis.txt' mode='r' encoding='utf-8'>

#### Read a text file
File objects are not readable. In order to make it readable, we need to use the .read() method

In [61]:
open('Kafka-The-Metamorphosis.txt', encoding='utf-8').read()



#### Write a text file

In [62]:
open('a-new-file.txt', mode='w', encoding='utf-8')

<_io.TextIOWrapper name='a-new-file.txt' mode='w' encoding='utf-8'>

In [63]:
open('a-new-file.txt', mode='w', encoding='utf-8').write('I just wrote this to a text file. Alright!')

42

In [64]:
open('a-new-file.txt', mode='r', encoding='utf-8').read()


'I just wrote this to a text file. Alright!'

It's good practice to specify the encoding!

## 5. Conditionals and Comparisons

Now that we've covered some basics, we can do a few more complicated things with Python. We can use it to ask very precise instrucitons about variables, lists, or data that we've defined: 

### Comparisons

There are a number of ways we can compare values in Python! Here's a list of some of them:


#### Greater than
Is the variable `person1` greater than `person2`?

In [65]:
person1 = 30
person2 = 30.5
person1 > person2

False

#### Not Equal

Is the variable `person1` not equal to `person2`?

In [66]:
person1 = 30
person2 = 30.5
person1 != person2

True

#### And
What will happen if we check whether `person1` > 30 and `person2` > 30?

In [67]:
person1 = 30
person2 = 30.5
person1 > 30 and person2 > 30

False

The boolean answer is False because `person1` is not greater than 30 (`person1` is exactly 30) even though `person2` is greater than 30. The and requires that both conditions are True.

#### Or


What will happen if we check whether `person1 > 30 or person2 > 30`?


In [68]:
person1 = 30
person2 = 30.5
person1 > 30 or person2 > 30

True

### Conditionals

### If Statement

An `if` statemetn is an insutruction to do soemthing *if* a particular condition is met.
This typically takes the follwoing two line form:

- On the first line, you type the English word `if` followed by an expression and then a colon (`:`)
- On the second line, you indent and write an instruction or "statement" to be completed if the condition is met



Here's a statement, followed by a conditional with a comparison

In [69]:
beyonce = "Grammy award-winner"

In [70]:
if beyonce == "Grammy award-winner": # This is a conditional 'if' statement, followed by a comparison
    print("Congratulations, Beyonce!")

Congratulations, Beyonce!


**Formatting matters!**

Take a look at the two examples below. Can you figure out what's wrong with the syntax?

In [71]:
# What's wrong with this example?

if beyonce == "Grammy award-winner":
print("Congratulations, Beyonce!")

IndentationError: expected an indented block (<ipython-input-71-2ee10a666716>, line 4)

In [72]:
# What's wrong with this example?

if beyonce == "Grammy award-winner"
    print("Congratulations, Beyonce!")

SyntaxError: invalid syntax (<ipython-input-72-b97c0652827d>, line 3)

### Else Statements
You can add other conditions to conditionals and make them more complicated with `else` statements. This will instruct the program to do something in case the condition is not met. An `else` comes after an `if` statement and should be formatted it the same way.

In [73]:
beyonce = "not a Grammy award-winner this year"

In [74]:
if beyonce == "Grammy award-winner":
    print("Congratulations, Beyonce!")
else:
    print("They messed up, Beyonce.")

They messed up, Beyonce.


### Elif Statements
What if we wanted EVEN MORE complexity? You can add **even more** nuance with `elif` statments. Short for *else if*, these statments tell the computer to evaluate the first `if` statement. If that statement is not True, it will then evaluate the `elif` statement.

In [75]:
beyonce = "Grammy award-nominee"

In [76]:
if beyonce == "Grammy award-winner":
    print("Congratulations, Beyonce!")
elif beyonce == "Grammy award-nominee":
    print("Ok well at least they nominated you, Beyonce.")
else:
    print("They messed up, Beyonce.")

Ok well at least they nominated you, Beyonce.


## 6. List and Loops

When working with data, we're often not just dealing with single variables, but collections of variables, stored in places like spreadsheets, as we've seen. 


One of the most common Python data collections is a *list*. By using a list, we can put the names of the people featured in the dataset into a single collection.

In [77]:
names = ['Mary', 'John', 'Margaret', 'Anthony']

In [78]:
type(names)

list



A **list** is always enclosed by square brackets [ ] and accepts items in a row separated by commas (,). A list can contain any combination of Python data types.


In [79]:
ages = [28, 19, 60, 30]

### Index

Like with strings, we can index a list! 

If we wanted to pull the first entry in our names list, we would type the following:

In [80]:
names[0]

'Mary'

In [81]:
names[1]

'John'

### Slice


You can also slice lists like you can slice a string.


In [82]:
more_names = ['Unity', 'Catherine', 'Thomas', 'William', 'Patrick',
              'Mary Anne', 'Morris', 'Michael', 'Ellen', 'James']

Here's how we would slicke to get the list starting from the 3rd item of the list onward:

In [83]:
 more_names[2:]

['Thomas',
 'William',
 'Patrick',
 'Mary Anne',
 'Morris',
 'Michael',
 'Ellen',
 'James']

### Reverse index


Because we can reverse index a list from the end to the beginning, we can also slice a list by starting from the 2nd to last item until the end.


In [84]:
more_names[-2:]

['Ellen', 'James']

### List Methods
`list.append(another_item)` 	adds new item to end of list  
`list.extend(another_list)` 	adds items from another_list to list  
`list.remove(item)` 	removes first instance of item  
`list.sort(reverse=False)` 	sort the order of list  
`list.reverse()` 	reverses order of list  

Let's try a few of these list methods out:

In [86]:
names.append("Isabel")

In [87]:
names

['Mary', 'John', 'Margaret', 'Anthony', 'Isabel', 'Isabel']

Notice what happened?

In [88]:
names.sort()

In [89]:
names

['Anthony', 'Isabel', 'Isabel', 'John', 'Margaret', 'Mary']

In [90]:
ages = [28, 19, 60, 30]

In [91]:
ages.sort()

In [92]:
ages

[19, 28, 30, 60]

### For Loops



One of the best ways to work with a list is with `for` loops. This is a way of considering each item in the list or "iterating" through the list.


In [93]:
names = ['Mary', 'John', 'Margaret', 'Anthony']

In [94]:
for name in names:
    print(name)

Mary
John
Margaret
Anthony




A basic basic for loop will consist of two lines:

- On the first line, you type the English word `for`, a new variable name for each item in the list, the English word `in`, the name of the list, and a colon (`:`)
- On the second line, you indent and write an instruction or “statement” to be completed for each item in the list



In [95]:
for name in names:
    print(f"Person's name is {name}")

Person's name is Mary
Person's name is John
Person's name is Margaret
Person's name is Anthony


We can name variables in our list whatever we would like, so long as we're consistent (though it helps to name them something legible:)

In [96]:
for x in names:
    print(f"Person's name is {x}")

Person's name is Mary
Person's name is John
Person's name is Margaret
Person's name is Anthony


We can combine `for` loops with other functions.

In [97]:
ages = [28, 19, 60, 30]

In [98]:
for age in ages:
    print(age * 2)

56
38
120
60


In [132]:
for age in ages:
    if age > 30:
        print("Person is less than 30 years old")
    else:
        print("Person is more than 30 years old")

Person is more than 30 years old
Person is more than 30 years old
Person is less than 30 years old
Person is more than 30 years old
