# 1. Quick intro to Jupyter Notebooks

For more info: [Jupyter Notebooks introduction](https://realpython.com/jupyter-notebook-introduction/)

### Notebook basics

Try to run the following code block. Remember that the shortcut is `shift+enter`, or you can press "Run" in the menu:

In [None]:
#This is going to print something
print("Welcome to Ling 450/807!") #a comment can also go here

Great! You did a code block. You can also write text blocks, changing the drop-down menu above to "Markdown". You will need to get familiar with [markdown syntax](https://www.markdownguide.org/basic-syntax/).

It's good practice to write very specific comments next to your code (right before or to the right of it), but general observations in markdown blocks.
A comment line in python is any line that starts with a "hashtag". This hashtag tells python to ignore whatever is after the hashtag and to go to the next line.

# 2. Python refresher

You should have some familiarity with python and programming in general. This is a quick refresher. If you want to learn more Python on your own, this is a good resource: [Learn Python](https://www.learnpython.org/)

### Variables
A variable is really just a way to store information like numbers, strings of words or even whole texts. To use a variable in Python you need to give the variable a name and then give it some value.

Variables can be numbers (integers or floats) and strings. 

Notice that in order to assign strings to a variable you need to put the string in quotation marks. Whenever Python sees something in quotation marks, it will assume that it is a string. This means that when you want to print the contents of a variable, you do not want to put it in quotation marks, since that will just print the name of the variable instead of the contents.

In [None]:
word = "rejuvenate" #This was the word of the day for Jan 1, 2025, according to Merriam Webster
letters = 10

print(word)
print(letters)

# if we put quotation marks around the variable name it will just print the name of the variable
print("word")
print("letters")

print("The word", word, "has", letters, "letters.")

Sentences are just strings, so we define them with quotation marks.

In [None]:
sentence = "The hotel package includes a day at the spa to rejuvenate guests."
print(sentence)

If you are not sure what type of variable something is, you can just check its type with `type()`.

In [None]:
type(word)

In [None]:
type(sentence)

In [None]:
type(letters)

### Lists

Lists are just objects, a data type in python. They are defined with square brackets. You can either create an empty list and populate it later or create and populate it all at the same time. 

In [None]:
#create an empty list and append stuff to it
myList = []
myList.append(1)
myList.append(2)
myList.append(3)

print(myList[0]) # prints 1
print(myList[1]) # prints 2
print(myList[2]) # prints 3
print(myList) #prints the whole list

In [None]:
#create and populate a list
myListOfDets = ["a", "the", "my", "that"]

print(myListOfDets)

You can also add to a specific place in the list with: `myListOfDets.insert(4, "some")`
Or append to the end with: `myListOfDets.append("one")`
Try those 2 commands in the code block below and print the list again.

### Working with strings

Most of the text data we will work on is in the form of strings. We can manipulate strings with different methods.

The `split()` method, for instance, splits a string into a list. The `sep` argument defines the character that is used as the boundary for a split. By default, the separator is a whitespace or empty space.

Let's use the `split()` method to split a string under text at empty space.

In [None]:
#first, define a string with extra stuff in it, like HTML tags
text = "<p>This is an <b>example</b> string with some HTML tags thrown in.</p>"

In [None]:
#then, put that string into another variable, a list of tokens, that is, words separated by an empty space
tokens = text.split(sep=' ')

In [None]:
print(tokens)

But now we have a list with words, yes, but also stuff attached to those words (the HTML tags). What if we want to get rid of those? Python strings also have a `replace()` method, which allows replacing specific characters or their sequences in a string.

Let's begin by replacing the initial tag `<p>` in text by providing `<p>` as input to its replace method. Note that the tag `<p>` is in quotation marks, as the replace method requires the input to be a string. The replace method takes two inputs: the string to be replaced (`<p>`) and the replacement (' '). By providing an empty string as input to the second argument, we essentially remove any matches from the string.

In [None]:
text = text.replace('<p>', '')

In [None]:
print(text)

You could continue this same procedure to get rid of the other tags and then finally create a list of tokens with only words in it. 

###  Conditions and loops

Conditions and loops are essentially **functions**. Remember myList above? we used 3 print statements to print 1, 2, 3. You can iterate over a list with a for loop. Or you can create conditions for what to do with an if loop.

In [None]:
for x in myList:
    print(x)

In [None]:
for x in myListOfDets:
    print(x)

In [None]:
if type(letters) == int:
    print("The variable", "letters", "is an integer.")

In [None]:
for x in myListOfDets:
    if x == "the":
        print("The list", "myListOfDets", "contains at least one determiner.")

### Functions
Functions are blocks of code that allow you to do things repeatedly and more efficiently. Some functions, like `for` and `if`, are already pre-defined in python. But you can also write your own functions. They have the following structure. Note the structure:
* name of the function (`head`)
* parentheses after the name (`()`)
* colon after name and parentheses (`:`)
* indent after the head. Use the `Tab` key on your keyboard to insert the indent

`head():
    function_line(s)`

The head often has one or more arguments, which you put in the parentheses in the head. The function lines often include something that you return (often a manipulation of the arguments). To create a new function, you first have to define it with `def`.    

In [None]:
#first, we define the function
def printing_function():
    print("This function prints this statement.")

In [None]:
#then, we can call it
printing_function()

Note that this is a pretty useless function definition. Python has many pre-defined functions, so we don't need to write our own. The function `printing_function()` just prints one statement ("This function prints this statement"). The built-in `print()` function can print anything that you put in the parentheses.

In [None]:
#now we define a function with two arguments. Instead of doing something, this function returns something (the sum)
def add_two_numbers(a,b):
    return a + b

In [None]:
add_two_numbers(1,1)

This is also a useless function. The `add_two_numbers()` function just does something that comes pre-built in python, the addition function.

In [None]:
a = 1
b = 1
a + b

### Classes and objects

When you want to manipulate several variables and do different things with them, you can define classes. Classes create new types of objects (which is why python is an object-oriented language). 

In [None]:
#we define a class that contains just one variable and just one function

class myClass:
    variable = "blah"

    def function(self):
        print("This is a message inside the class.")

Now that we have a class, we can define objects of this class. An object is an instance of that class. Those objects will contain the variables and functions available within the class


In [None]:
#creating an object and making it of type myClass:

myObject = myClass()

In [None]:
#now we can access the properties of the class through the object

print(myObject.variable)
myObject.function()

In [None]:
#and change those properties

myObject.variable = "foo"
print(myObject.variable)

### Libraries (modules, packages)

Python itself contains lots of libraries and packages that in turn have functions pre-defined. Other people also create libraries to do standard things. 

Libraries (or packages) are entire programs that do certain things. Sometimes, those libraries have modules that do more specific things. 

Libraries that we are going to use a lot include:

* numpy
* pandas
* nltk
* spacy

If you are importing an entire library, you just import everything:

`import nltk`

To import a module from that library:

`from nltk import tokenizer`