<h1>Introduction to Python, Part 1</h1>

<h2>Working with Jupyter Notebooks</h2>

[Jupyter notebooks](https://docs.jupyter.org/en/latest/install.html#jupyter-notebook-interface) are a type of document that can combine code with text, images, equations, and visualizations. Jupyter notebooks have the file extension .ipynb, which stands for "IPython Notebook." We will be working with Jupyter notebooks on [JupyterHub](https://jupyter.org/hub), which is a cloud-based platform that gives users access to shared computational environments without necessitating installation. You can access JupyterHub through a web browser and login using your UW NetID.

<h2>Cells</h2>

Jupyter notebooks are made up of cells, which can be run one at a time. To create a new cell, click the **+** button on the top ribbon or press **option+return** on Mac or **alt+return** on Windows. 

![Screen Shot 2022-10-23 at 9.41.08 PM.png](attachment:a48167e6-76a6-43c2-910d-2cfae35eda42.png)

To run a cell, click the play button in the top ribbon or press **shift+return**.

![Screen Shot 2022-10-23 at 9.57.26 PM.png](attachment:6b777506-d24f-4410-afbd-21e9daa83a5f.png)

<h2>Code &amp; Markdown</h2>

Jupyter notebooks have two main kinds of cells: code and markdown. In the code cells, you can write code that can be executed. In the markdown cells you can write formatted text. You can specify the cell type in the menu in the top ribbon.

![Screen Shot 2022-10-23 at 9.40.30 PM.png](attachment:75797e66-94d4-4b85-b480-117c80b0692f.png)

<h3>Markdown</h3>

**Markdown** is a markup language for text formatting, and it allows you to assign headings, highlight text in **bold** or *italics*, create bulleted lists, incorporate images, and so on. For example:

<h4>Markdown is useful because it...</h4>

 * keeps content and formatting separate
 * makes it easy to convert documents into different formats
 * works across systems and platforms
 * is easy to put online

This block of text is written in markdown. To see how it is structured, double click in the cell to enter editing mode. To get out of editing mode and view the formatting for the cell, click **shift+return**.

**Try it**: create a new markdown cell and respond to the following question. Put at least one word in **bold** and one in *italics* in your answer. **Question**: What is your favorite book and why?

<h3>Code</h3>

**Code** cells are where you can write, edit, and run Python code. As you write, bits of code will show up in different colors associated with how they are functioning in the script. The code in the cell must be correct within Python's syntax in order to run.

If you want to write regular text in a code cell (text that you don't want executed), you can do so by including a **comment**. The "#" character is used to initiate comments. Lines beginning with "#" will not be executed when you run the cell, as you can see in the example below. If you want to create multiline comments, you can use three quotation marks """ at the beginning and end of the comment.

In [None]:
#trying out a print statement

print ("It's a beautiful day in Seattle")

**Try it**: Create a new code cell, add a comment, and print a word or phrase.

<h2>Reading Python Scripts</h2>

Python is based on the English language, and you can read it in a similar way: from top to bottom and from left to right. Indentation carries meaning in Python, and it affects how lines of code are executed. Don't worry too much about the specifics of indentation for now, but it's a good thing to note as you read through code.

<h3>Example 1</h3>

Look at the days-to-seconds calculator (below). Take a moment to look through the code on your own and make an informal guess about what is going on in each line. 

**Question**: if you wanted to turn this from a days-to-seconds calculator into a days-to-minutes calculator, where would you make a change?

In [None]:
#calculate the number of seconds in n days

num_days = 4

num_secs = num_days*24*60*60
response = "In "+str(num_days)+" days there are "+str(num_secs)+" seconds"
    
print(response)

<h3>Example 2</h3>

Below is code adapted from Prof. Walsh's "Anatomy of a Python script" chapter of *Intro to Cultural Analytics and Python*, applied to an eBook of Bram Stoker's *Dracula*. 

If you want to try running this code, first upload the "stoker_dracula.txt" file to the environment. You can do so by clicking the up arrow on the left panel and navigating to the file.

![Screen Shot 2022-10-24 at 9.25.41 AM.png](attachment:8b34c024-6726-429b-8e1b-fd0275bc7e87.png)

In [None]:
"""
Example Python code for
calculating word frequency
in Bram Stoker's Dracula
"""

# Import Libraries and Modules

import re
from pprint import pprint
from collections import Counter


# Define Filepaths and Assign Variables

filepath_of_text = "stoker_dracula.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']


# Read in File

full_text = open(filepath_of_text, encoding="utf-8").read()


# Manipulate file: make all text lowercase and split into list of words

lowercase_text = full_text.lower()
split_words = re.split("\W+", lowercase_text)


# Remove stopwords

meaningful_words = []
for word in split_words:
    if word not in stopwords:
        meaningful_words.append(word)

# Analyze list of meaningful words

meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)


# Output Results

pprint(most_frequent_meaningful_words)

Now let's break this up a bit. We'll go through each chunk of code, run it, and see what kind of output we get.

In the lines below, we're importing external libraries. These are packages of code created by other people that we can use to help with the tasks we want to complete. Since we are just importing the packages here, no output will appear in the notebook.

In [None]:
# Import Libraries and Modules

import re
from pprint import pprint
from collections import Counter

Next, we're defining filepaths and assigning variables. First we indicate the path to the file we plan to use. Then we indicate the number of words we want to appear in the output. Then we define a list of "stopwords" that will not be included in our calculations.

In [None]:
# Define Filepaths and Assign Variables

filepath_of_text = "stoker_dracula.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

Next, we read in the Dracula file through the "filepath_of_text" variable we defined earlier. This will not produce an output, but if we want to print part of the text of the novel, now we can do so.

In [None]:
# Read in File

full_text = open(filepath_of_text, encoding="utf-8").read()

In [None]:
print(full_text[:2000])

Here we use .lower() to change all the text in *Dracula* lowercase, and we use .split() to break it up into a list of individual words.

In [None]:
# Manipulate file: make all text lowercase and split into list of words

lowercase_text = full_text.lower()
split_words = re.split("\W+", lowercase_text)

print(split_words[:40])

We then remove stopwords--very common words like "a" or "the," which we do not want to include in our analysis--and create a list of "meaningful words."

In [None]:
# Remove stopwords

meaningful_words = []
for word in split_words:
    if word not in stopwords:
        meaningful_words.append(word)

In [None]:
print(meaningful_words[:40])

Finally, we find the most common words in our list of meaningful words and print the results (below).

In [None]:
# Analyze list of meaningful words

meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

In [None]:
# Output Results

pprint(most_frequent_meaningful_words)


**Question**: If you wanted to change the number of words that appear in the output, where would you make the change?

<h2> Save your work!</h2>

To save your notebook, go to File>>Save Notebook in the top ribbon

Or press Command ⌘ + S (Mac) / Windows Key + S (Windows).

<h2>Variables</h2>
<h3>Assigning Variables</h3>

A lot of the steps we just did involved working with variables. You can think of a variable (drawing on Prof. Walsh's framing) as a "tiny container where you store values and data, such as filenames, words, numbers, collections of words and numbers, and more."

Variables point to values, which you assign. To assign a variable you use the **=** sign.

In the bit of code we've been working with, we've assigned a number of variables:

 * filepath_of_text
 * number_of_desired_words
 * stopwords
 * full_text
 * all_the_words
 * meaningful_words
 * meaningful_words_tally
 * most_frequent_meaningful_words

**Try it**: print some of these variables out to see what they contain.

In [None]:
print(filepath_of_text)

<h3>Naming Variables</h3>

(*from Intro to Python for Cultural Analytics*)

Though we named our variables filepath_of_text, stopwords,number_of_desired_words, and full_text, we could have named them almost anything else. Variable names can be as long or as short as you want, and they can include:

* upper and lower-case letters (A-Z)
* digits (0-9)
* underscores (_)

However, variable names cannot include:

❌ other punctuation (-.!?@)

❌ spaces ( )

❌ a reserved Python word

It is important to strive for meaningful variable names. In the example above it was easier to follow what was in each variable because they were named in a logical way.

<h3>Re-assigning Variables</h3>

You can re-assign the value of a variable at any point. For example, if we wanted to change the number of meaningful words that appear in our meaningful words list, we could change the "desired_number_of_words" variable.

<h2>Python Datatypes</h2>

There are 4 fundamental Python datatypes:

 * Strings (text)
 * Integers (whole numbers)
 * Floats (decimal numbers)
 * Booleans (True/False)
 

<h3>Checking datatypes</h3>
You can check the datatype of any value by using the function type(). Let's look at the variables we assigned in the beginning of class:

In [None]:
type(full_text)

<h3>Casting to different datatypes</h3>

![python_datatypes.png](attachment:8516f8a2-15bf-4a8b-8ed5-96555af8f5a0.png)

These are basic description and examples of different datatypes, but you can also cast certain values to other! datatypes. To cast to other datatypes, you can use the functions:

 * str()
 * int()
 * float()
 * bool()
 
Importantly, this only works in certain directions. You can, for example cast an integer to the string datatype, so it is treated as text, but you can't turn a string with alphabetic characters into an integer.

In [None]:
days_in_a_year = str(365)
type(days_in_a_year)

However, I cannot turn a string into an integer.

In [None]:
int("there are 365 days in a year")

**Question**: take a look at our days-to-seconds calculator. Where in the code are we casting to a different datatype? What is the original datatype of the variables we're working with? And what datatype did we change the values to in our final output?

In [None]:
#calculate the number of seconds in n days

num_days = 4

num_secs = num_days*24*60*60
response = "In "+str(num_days)+" days there are "+str(num_secs)+" seconds"
    
print(response)