<a href="https://colab.research.google.com/github/joshcova/NLP_Workshop/blob/main/01_Intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to Python and the Google Colab programming environment

This is a Jupyter notebook, an interactive and intuitive way of working with Python. A couple of key concepts before we get started

**Python**: The programming language we are using

**Google Colab**: The web-based, cloud-powered interface that we are using to work in Python. There are a lot of local alternatives (e.g. VS Code, PyCharm etc...).

**Jupyter notebook**: The document format (.ipynb file extension). This is good because it allows for a more interactive "exchange" between code and output. Please click on the top left icon "Open in Colab" to work with the code.

In a Jupyter notebook you can run the code by clicking on the play button or hitting **Ctrl + Enter** or **Shift + Enter**.

There are a lot of applications for which it would make sense to work in Python, as our focus is using Python to work with text as data, we will focus on learning the parts of Python, which will bring us closer to a mastery of text-as-data. We will start with the very basics and gradually move towards simple text manipulation.

- Data types
- Variable assignment
- Reading and wrangling data
- NLP applications

In [None]:
# "Obligatory" first command

print("Hello World")

### 1. Basic Python Syntax: Variables, Data Types, and Printing

Python is known for its readability. Let's start with how to define variables and understand the differences between common data types.

In [None]:
# Variables: Assigning values to names
text_data_example = "This is a sentence for analysis."
number_of_documents = 10

# Data Types:
# - Strings (str): Used for text (e.g., 'Hello', 'world')
# - Integers (int): Whole numbers (e.g., 10, -5)
# - Floats (float): Numbers with decimal points (e.g., 3.14, 0.5)
# - Booleans (bool): True or False values

# Let's check the type of our variables
print(type(text_data_example))
print(type(number_of_documents))

# Printing output to the console
print("\nHello, Python!")
print("We are learning about quantitative text analysis.")



In [None]:
# Variables do not only store information, but can also be used to make calculations

print(number_of_documents*10)

In [7]:
# just because we think that a variable is a certain data type does not mean that Python also knows it

var1 = "12"
var2 = 24

In [None]:
# why does this work?

var2 - 10

In [None]:
# but this does not?

var1 - 10

In [None]:
var1 = int(var1)

## 2. Data structures

Now that we know how some data types look like, let's think about how we can combine those into larger structures.

In [None]:
# Our first data structure: the mighty list
scores = [85, 90, 78, 92]
print(scores)

In [None]:
# You can conduct operations on lists, for example:

max(scores)
min(scores)

In [None]:
# What if we have a long list, but are only interested in extracting an item from it. The numbering of the items within the list is quite counter-intuitive in Python.
print(scores[2])

In [None]:
# What about this:
print(scores[0])

In [None]:
# to add a specific item to the list
scores.append(45)

In [None]:
# if you are unsure you can always check what data structure you are working with

type(scores)

In [None]:
# Let's now look at another data structure that is often used in Python code for NLP applications: dictionaries (no surprises there!)
# While items in a list are demarcated by square brackets here we use curly brackets. Contrary to a list, in a dictionary every item (value) corresponds to a key

sentiments = {"positive": 1, "neutral": 0, "negative": -1}
print(type(sentiments))

In [None]:
my_first_dict = {
    "name": "Joe",
    "age": 25,
    "country": "Germany"
}

my_first_dict["name"]
# key-value pairs.

In [12]:
my_first_dict = {
    "name": ["A", "B"],
    "age": [25, 26],
    "country": ["Germany", "UK"]
}

my_first_dict["name"]
# key-value pairs.

['A', 'B']

### 2. Working with Strings (Text Data)

Strings are fundamental when dealing with text. Python offers many built-in methods to manipulate and analyze strings.

In [None]:
my_sentence = "Natural Language Processing is exciting for social scientists."

# Length of a string
print(f"Length of the sentence: {len(my_sentence)}")

# Converting to lowercase (useful for text normalization)
lowercase_sentence = my_sentence.lower()
print(f"Lowercase: {lowercase_sentence}")

# Converting to uppercase
uppercase_sentence = my_sentence.upper()
print(f"Uppercase: {uppercase_sentence}")


Length of the sentence: 62
Lowercase: natural language processing is exciting for social scientists.
Uppercase: NATURAL LANGUAGE PROCESSING IS EXCITING FOR SOCIAL SCIENTISTS.


In [None]:
# you can also split a longer string variable into its component parts
# this will be useful when we will be dealing with text tokenization
my_sentence.split()

## 3. Wrap-up

In this first notebook we have focused on getting an introduction into the Python programming environment, which will set the scene for the more advanced NLP applications that we will be working on in the next sections.