## Introduction to Cultural Analytics (ht-2023)

Matti La Mela, matti.lamela@abm.uu.se

### Lab material 1a: Introduction to Python basics

This Notebook introduces us to basic programming concepts in Python such as variables, operations, iterations, and functions. The concepts would be similar in other programming languages too.
<br>
<br>
The reference readings for this learning material and week 1 is:

**o "Python Basics" in Walsh, M. (2021). Introduction to Cultural Analytics and Python. https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html**

Other work about Python programming basics that you can consult are:

o Chapters 1-3, 5, 7-8 and 10 in Downey, A. B. (2015). Think Python: How to Think Like a Computer Scientist, 2nd ed., Needham, https://greenteapress.com/wp/think-python-2e/

o Chapter 1 (with a focus on text only) in Bird, S., Klein, E. and Loper, E. (2019). Natural Language Processing with Python (NLTK 3.0), https://www.nltk.org/book/

o Mattingly, W. J. B. Introduction to Python for Humanists (http://python-textbook.pythonhumanities.com/intro.html), and youtube tutorials: https://www.youtube.com/@python-programming (Textbook from Routledge (2013), "Introduction to Python for Humanists")

<br>

### Code cell:

When running the code cell you will get the output below the cell. You are free to modify the code cell yourself. If the code is not correct, for example
that the syntax is not correct, you will get an error message below the cell.

New cells, which can then be defined as Code or Markdown can be created by clicking the plus symbol. Scissors are used for cutting. Shift-enter or the play symbol runs the cell.

We should run the cells in the order they are entered here, as some of the variables or operations we do in the cells are required for the cells below to work. The order of running the cells is giving in the brackets next to the cell. Jupyter Notebook stores the variables we have created in the previous cells.

In [None]:
# This is a code cell. Hashtag is used to write comments that are not executed, but which describe the code. 
# The code is executed from the beginning to the end.

# We use the print() function to print in the output section below the cell.

print(2)

# The program first executes what is inside the () of our function

print(5 + 5)

In [None]:
# When we print a string, we need to have it between quotation marks "".

print ("Hello world")

# We can also combine strings together with +

print ("Hello" + " " + "world" + " " + "!!!")


## 1. Variables

Variables are used to store information (values) which we calculate and process in our code. The variables have different data types depending on what values we assign to them, eg. numeric value or strings. The variable is created and is given a type (integer, string) when we assign a value to it for the first time.

We can make up the variable names by ourselves, but they should be precise, meaningful, and concise to keep our code understandable for us and others. The variable names cannot start with a number, and cannot include whitespace (use underscore). It is not possible to give names that are used already in the programming syntax or are defined already, eg. "not" or "if" are not valid variable names.

In [1]:
# Let's define two variables that have the numeric values of tables and chairs in a classroom

tables = 4
chairs = 12

print(chairs)
print(tables)

12
4


In [None]:
# Jupyter notebooks print the variable (or other value) also without print(), but only the value of the last variable that you want to print.
# This works well when for example examining the code.

chairs

# tables

In [None]:
# We create here three variables by assigning values to them.

book_title = "Introduction to Cultural Analytics & Python"
book_year = 2021
book_rating = 9.3
book_is_available = True


In [None]:
# Defining the variables does not give us any output. The values are assigned and stored in the variables until we close the notebook
# or reset the Kernel (the program that runs our code).

# We print the string "The book to be analysed is: " and add there the value of the variable book_title (which is also a string).

print("The film to be analysed is: " + book_title)

# When printing the string "The book was published in: " and the integer "book_year", we get an error unless we convert the numerical value
# into a string. We cannot add together a string data type and a numerical data type. We convert the numerical value with str() function.

print("The book was published in: " + str(book_year))

# Another nice way to print strings and number values is to use the so-called "f-strings".

# We can also use "f-strings" which are formatted string literals. This enables to combine literals and variables easily together into strings,
# but has also options for formatting e.g. defining how the printed output is aligned. See for example: https://fstring.help/cheat/

# We have the letter f before the "", and refer to variables with {}.

print(f"The book {book_title} was published in {book_year}")



In [None]:
# Booleans are data types with two values: True or False, thus they are binaric (0 or 1).

# We use here the if statement to see if our variable "book_is_available" is True or False:

if (book_is_available):
    print("The book is available at the library")
else:
    print("The book is not available at the library")



### These variables are of four data types:

**o book_title** was created as a string variable. This is done by typing the string of characters between the quotation marks ("") (one character, many characters, words/tokens, the "space" is a character for the computer).

**o book_year** variable is an integer (int). Integers are whole-valued numbers and a very commonly used variable type.

**o book_rating** is a floating-point (float) variable, as it allows decimal points.

**o book_is_available** is a boolean variable (bool). Booleans are binaric data types and they are either True or False (thus 0 or 1).


In [None]:
# We are able to see the variable type with the function type().

# You can change the variable book_year to book_title, and see what types the other variables are.

variable_type = type(book_year)

print(variable_type)

In [None]:
# Strings are "sequences of characters" thus not words per se.

# We are able to work with the individual characters with brackets, which expresses the index. There is no data type for individual characters,

# but a character is a string that has the length of one character.

print(book_title)

# What is the first character? The indexes start from offset 0!

print(book_title[0])



In [None]:
# We can also pick a slice of the string by typing "from:to" inside the brackets. NB, "from" starts from 0, but the "to" needs to be
# the order number following the character which we want to include to the range.

# Thus, the if we want a range from the first to the fifth char, we start from index 0.
# The fifth character is index number 4, but as we want to include it, we type 4+1 = 5. Index 5 points to the beginning of the sixth char,
# but does not include it. See also: our Think Python book, ch. 8.4.

print(book_title[0:5])



In [None]:
# We can use another variable as the index:

end = 2

print(book_title[0:end])

# If we want to start from the beginning of the string, we type simply [:6] without the 0. Which is even more useful,
# it works also the other way round, so [4:] takes us from the fifth character until the end of the string.

print(book_title[4:])

# Another tip is to use minus values in the index. -3 would take us to the third last character of our string.

print(book_title[-3:])

### 1.1 Operators and variables

Arithmetic operators are used to do calculations with the variables: + (addition), - (substraction), * (multiplication), / (division). The operators work with number variables, + and * also with with strings.

See for example: https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/05-Data-Types.html

http://python-textbook.pythonhumanities.com/01_intro/01_02-02_data.html

We will talk later about relational operators (e.g. "greater than" > ) and logical operators (and, or). These help us to compare two variables, and they return True or False depending on the condition.

In [None]:
# We have the number of words in a book assigned to book_1_words and book_2_words. We want to calculate the sum and average of the pages.

book_1_words = 291
book_2_words = 496

book_all_words = book_1_words + book_2_words

print("The books contain " + str(book_all_words) + " words")

# The average is the sum divided by number of books.

book_average_words = book_all_words / 2

print("The average word count is " + str(book_average_words))


In [None]:
# We assign another value to book_1_words

book_1_words = 100

# Note that the variables are not dynamic ("book_all_words") but rather containers, thus, we would need to redo the calculation above
# and assign the value again to "book_all_words".

print(book_all_words)

book_all_words = book_1_words + book_2_words

print(book_all_words)


In [None]:
# Example with a string variable:

print(5 * book_title)

### 1.2 Lists

It is a bit cumbersome to create a variable for every book, e.g. if we have 200 books in our dataset. During next labs we learn to use
Pandas dataframes (cf. a spreadsheet).

Another data type that are useful for storing several values are lists. A list contains other data types in a sequential order (like strings contain characters in a sequential order). Lists are useful in data processing, for example, we can tokenize a sentence into list of words and then work on every element in the list.


In [None]:
# Here is an example of an list of integers. A list is created by using brackets, and the values in the list are separated with commas.

observed_values = [14, 15, 16, 20, 12]

print(observed_values)

print("We have " + str(len(observed_values)) + " in our list of observed values")

# or as a f-string:

# print(f"We have {len(observed_values)} in our list of observed values")


In [None]:
# Similar to strings (and the characters of the string), we can use the index to point at certain list value.

print("The first value is : " + str(observed_values[0]))


In [None]:
# Here is a list of strings:

book_titles = ["Introduction to Cultural Analytics & Python", "Distant Horizons", "Data Feminism"]

print(book_titles)

# len() function gives us the length of a string (how many characters), but also the length of a list (how many variables are we storing).

print("The amount of books on our list: " + str(len(book_titles)))


In [None]:
# It is possible to create a list of lists (a nested list), but at this point it might make sense to start using
# a more suitable and powerful solution like the Pandas data frames.

book_information = [["Title1", 2012], ["Title2", 2014]]

print("The title of our first book is: " + book_information[0][0])
print("The publication year of our first book is: " + str(book_information[0][1]))

In [None]:
# We can add and remove entries from a list with list methods.

# .pop() removes the last element on the list. .remove("NAME") removes the string that has the value "NAME" from the list. We can use also the index with .remove.

# So for removing the last entry, we could write either:

book_titles.pop()

# book_titles.remove(book_titles[-1])

# book_titles.remove("Data Feminism")

print(book_titles)

In [None]:
# Let's add Data feminism back to our list of books. We can add elements to the list with .append() and .insert() methods.

# .append() is to use an element to the end of the list.

book_titles.append("Data Feminism")

print(book_titles)

# If we enter new values to a specific place on the list, we use .insert() method. The index number is "in between" two list elements, thus 0 at the beginning of the list, 1 is between first and second.

book_titles.insert(3, "The Digital Humanities Coursebook")

print(book_titles)


### 1.3 String methods

The variables have in-built functions that allow to do various operations with the variable. These are called methods (functions that are related to an "object"). We call the methods by typing the variable dot method_name. We used list methods already in the list example above: book_titles.append().

The most useful for us are string methods and list methods. You can read more about methods in our coursebook, or in Thinking Python chapter 8.8 (strings) or 10.6 (list methods).

TIP: In IPython, that the Jupyter Notebook uses, you can get a list of all the methods by typing the variable name and cliking "tab" on your keyboard. "Tab" can be also used to complete the variable name you are typing.

Also, in IPython, we get details about any object by adding question mark after it, e.g. book_titles.append?

In [None]:
# We can transform our strings to uppercase or lowercase rather easily by using the .lower and .upper methods:

book_name = "The Digital Humanities Coursebook"

print (book_name.lower())

upper = book_name.upper()
print(upper)

In [None]:
# For more information about any object, just use ?

book_name?

# or for example:

# book_name.upper?

# For methods, it might be better to google a bit, so you get also examples how to use the methods in practice.

In [None]:
# With s.replace("a", "b"), we can replace all a with b in our string s.

# In this case, all letters "o" have been recognized as zeros "0" in the OCR process. Let's replace them.

sentence = "Midway up0n the j0urney of 0ur life, I f0und myself within a f0rest dark"

cleaned_sentence = sentence.replace("0", "o")

print(cleaned_sentence)

In [None]:
# Replace is useful for removing characters from a string, by replacing them with nothing, ""

# We are studying tweets, but we want to remove all question marks and exclamation marks.

tweet = "I think Uppsala is a great city???!!!!???"

tweet = tweet.replace("!", "")
tweet = tweet.replace("?", "")

print (tweet)

# There are also more efficient ways for cleaning strings for example by using regular expression, and we will come back to these later.

## 2. Conditional statements and operators

Another group of operators are relational or comparison operators: == (is equal?), != (is not equal?), > (is greater than?), < (is less than?), >= (is greater than or equal?), <= (is less than or equal?). The comparisons return truth values: true or false.

We can accompany these with the logical operators and, or and not, which are familiar to us e.g. from google searches.

In [None]:
result = 5 > 4
print (result)

print(type(result))


In [None]:
# Are the following true or false?

print (5 == 5)
print (4 < 7)
print (8 != 8)


In [None]:
# Note that = is used for assigned a value to a variable

# and == is a relational operator, meaning are the two variables equal or not

In [None]:
# == and != work with strings

book1 = "Introduction to Digital Humanities"
book2 = "Cultural Analytics"

print (book1 == book2)        # Here we are asking: does the value of "book1" equal to value of "book2"?


In [None]:
# Strings consist of characters for the computer, and we have to consider cases ("A" is different than "a")
# So the following is false due to case-sensitivity: A is not a! That's where lower() might come useful..

print ("Cultural analytics" == "cultural analytics")


One key aspect in programming is to use conditional statements to guide how our program runs. This is done with "if", "else", and "elif".

If our "if" statement is true, the following code is executed, if it is fales, nothing happens or the program continues with the code in "else".

"elif" can be used to give another if statement if the preceding conditions has not been met.

**NOTE!** Here we indent the code block that will be run, if the "if" statement is true. The same goes for else.

In [None]:
# We have entered 124 books in our corpus, but we want to reach the level of 150 books. Note the indentation!

books_amount = 124

if books_amount >= 150:
    print("We have enough books, at least 150!")
else:
    print("We still need more books to the corpus, get back to work!")


# You can change the value of the variable books_amount to see if we get the other message too.

# Remember that the if is followed with boolean condition: books_amount >= 150 returns either True or False. .. If "True", do this, else, do that.


In [None]:
# We can add logical operators to form more complex conditions for the if statement.

book1 = "Cultural Analytics"
book2 = "Cultural Analytics"

if book1 == book2 and books_amount >= 150:
    print("Both conditions are met!")
else:
    print("Either book1 == book2 is False, or books_amount >= 150 is false.")

## 4. For and while loops

While and for statements are used to repeat a block of code. "For" is used to go through an iterable object like a string, list or "range". While is used for repeating the block of code until a condition holds (thus has not been reached). Especially with "while" we can create an infinite loop (the condition is never met). We can stop the loop with ctrl-c.

Similar to if, we use indentation to define the code blocks that are part of the loops.

In [None]:
# For statement executes the loop over a sequence, for example range(a, b). It proceeds incrementally.

# We have variable "i" going incrementally from 0 to 5. Again, 5 means here the "end of 4" and the beginning of 5, so we won't go until 5.

for i in range(0, 5):
    print(f"For this iteration of the loop the value for i is: {i}")
    if i == 2:
        print("i is two")
    
    

In [None]:
# In this example, we will repeat the while loop until the variable "count" has reached 5. The "count" starts from 1.

# The condition for the while loop is: continue repeating the code of block while "count" is less than 5.

count = 1

print (f"We start from: {count}")

while count < 5:
    print("Count is less than 5, so we are in the while loop.")
    count += 1    # is same as: counting = counting + 1
    print(f"We increase count by one. It is now: {count}")
    print("Let's go back to the while statement and see if the condition count < 5 is false. If it is true, we stay in the while loop for another iteration.")

    
print(f"Outside the while loop, because count < 5 was false. Counting is now: {count}")



In [None]:
# For is very useful for iterating through strings (the characters of the string) or lists (all elements in a list).

tweet = "What a wonderful day!"

# For loops the code of block with indentation. At every step, it assigns the characters of the string "tweet" in a sequential order to our variable "char".

for char in tweet:
    print("The character is: " + char)
    

In [None]:
# We can use the index here to take only a slice of the string.

for char in tweet[0:5]:
    print("the character is: " + char)


In [None]:
# Finally, we can use the for loop also to go through lists like we used it for ranges and strings.

book_titles = ["Introduction to Cultural Analytics & Python", "Distant Horizons", "Data Feminism"]

# We use for to iterate through our list of book titles. Inside the for loop, we print the length of each string in our list.

# At each iteration, the value of the list element is assigned to "titles"

for title in book_titles:
    print(f"The book title is: {title}")
    print(f"The length is: {len(title)}")
    print("")


### 4. Functions

We have used functions already above, for example, print(), type() and len(). We are also able to define functions by ourselves for operations that we repeat often in our programme. We give the function a name and the code that we want to include. The functions are not run by the program, if we do not call them.

The functions often take an "argument" when they are called: argument is between the parentheses, e.g. len(sentence). Many functions give back a result, which is a return value, for example, the length of variable "sentence".

In [None]:
# We define a simple function called "print_text", which prints text. There is no return value or arguments needed.

def print_text():
    print("Our function prints text")
    print("And another line")
    print("and one more line")
    print("")
    
# we can call our function. NB! The code starts from here, the function is just defined but not executed until we call it in the main part
# of our code here below.

# Uncomment (remove the hashtags) and see how the function is called.

# print_text()
# print_text()
# print_text()


In [None]:
# We define our own function, which greets us. The argument given to the function is assigned to variable "name".
# The variable "name" is printed as part of the code in the function.

def greet(name):
    print("Hello " + name + "!")
    
# now we call the function and give it the parameter "Your_name":

greet("Your_name")

# You can type your name to our greet() function.

In [None]:
# We have a data processing case, where we need to convert strings to lowercase, remove all exclamation marks and add a question mark
# to the end.
# We think that a function could be useful, as we repeat this in our program several times. In this function we have a return value
# in the end.

# We first define the function that we name clean_text. The parameters given to the function are in variable "text".

def clean_text(text):
    text = text.lower()             # Here we convert the text variable into lowercase
    text = text.replace("!", "")    # Here we remove all "!", thus replace them with ""
    return(text)

tweet = "Uppsala is truly a GREAT city!!!"

# We process the tweet with our clean_text function and then print the output.

result = clean_text(tweet)

print(result)

# Note that we assigned the output, that is, what our function returned, to the variable "result". Our original string "tweet" has not changed. We assigned
# it the string "Uppsala is truly a GREAT city!!!", and it stays there in the "tweet" variable until changed.

# print(tweet)


### Well done! We are done with some basics of python programming. These areas of basic programming are useful for us when we move to data processing. We will next look at how to read and write files (Material 1b).

Please revise these basic in the our main textbook by Walsh. It is recommended that you read especially about one more inbuilt data type called dictionary. Dictionaries are collections of key and value pairs. They are like lists, but the index is not a numerical value but any other data types too like strings. For dictionaries, see https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/11-Dictionaries.html or Think Python chapter 11.
