# NLP: Python basics

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

### First few things to know

Python is a programming language, while Jupyter notebook or Google Colab are environment for you tou develop and run your code.  In the **notebook environment** , there are two types of **cells** : **Markdown** (this one) and **code** (cell you can run your code).  To run a code cell, click on the cell, press `shift + enter` or `ctrl + enter` .

In [None]:
print("Press shift + enter to run the cell and move on to the next cell.")
print("Press ctrl + enter to run the cell and stay at the current cell.")

If you have a long object called `my_name_is_unusually_long` , it is hard to remember and type it accurately.  You may type `my_name` and then press `tab` to let the notebook environment **auto-complete** the name.  For Google Colab, instead of `tab` , you may wait or use `ctrl + space` .

In [None]:
my_name_is_unusually_long = "my-name"

In [None]:
# put your curse at the end of my_name and wait
my_name

When seeing a new function, you may use `?` to read its documentation.

In [None]:
a = "nsysu"
a.upper?

Whenever you encouter any difficulty or error message, you are encourage to **ask** Google or ChatGPT.  

### Assisgn and print

Use `=` to assign the value on the right-hand side to the variable on the left-hand side.  
Then use `print` to print the variable.

In [None]:
greeting = "hello"
print(greeting)

In [None]:
greeting = greeting + ", how are you?"
print(greeting)

By default, the notebook environment print the representation of the variable in the last line.

In [None]:
reply = "fantastic, how about you?"
greeting
reply

### Data type

Data type tells Python how to handle your object.  For example, you may ask Python to upper-case a string, but you cannot upper-case an integer.

In [None]:
a = 235 # integer
b = "two three five" # string
c = True # boolean
d = (2, 3, 5) # tuple
e = [2, 3, 5] # list
f = {"two": 2, "three": 3, "five": 5} # dictionary
type(f)

In [None]:
a.upper()

In [None]:
b.upper()

### String

Python offers different ways to input a string.

In [None]:
a = 'The book "1984" is awesome.\nI enjoyed reading it.'
b = "The book '1984' is awesome.\nI enjoyed reading it."
print(a)
print(b)

To avoid a long line of input, you may use paranetheses to make your code looks better.

In [None]:
c = ("The book '1984' is awesome.\n"
     "I enjoyed reading it.")
print(c)

The symbol `\n` is an **escape character** that stands for "new line".  You may use `r"..."` or `r'...'` to keep the original form instead of converting the escape characters.  This is particularly useful when your string is the code of some programming language.  

See a list of escape characters [here](https://www.w3schools.com/python/gloss_python_escape_characters.asp).

In [None]:
d = r"The book '1984' is awesome.\nI enjoyed reading it."
print(d)

To record all the line breaks and tabs, use triple quotes.

In [None]:
e = """The book '1984' is awesome.
I enjoyed reading it."""
print(e)

A few functions are powerful on strings.  

- `split` : split the string into a list of words
- `join` : merge several string together
- `upper` , `lower` , `capitalize` : make letters upper-case, lower-case, or capitalize the sentence
- `startswith` , `endswith` : check if the string starts or ends with some letters
- `isascii` : check if the string is ASCII only

In [None]:
b = "The book '1984' is awesome.\nI enjoyed reading it."
b.split()

In [None]:
b = "The book '1984' is awesome.\nI enjoyed reading it."
words = b.split()
'_'.join(words)

In [None]:
greeting = "hello, How are you?"
print(greeting.upper())
print(greeting.lower())
print(greeting.capitalize())

Use the `f-string` to embed some variables in your string.

In [None]:
book_name = "1984"
description = "awesome"
text = f"The book '{book_name}' is {description}.\n I enjoyed reading it."
print(text)

### `for` loop and `if` statment

A `for` loop allows you to do routine works.  An `if` statement gives you the flexibility to do different things in different conditions.   

In [None]:
for noun in ['apple', 'balls', 'cats', 'dollar', 'engine', 'formulae']:
    if noun.endswith('s'):
        print(f"{noun} is plural.")
    else:
        print(f"{noun} is singular.")

# Grammar is not a set of rules; it is something inherent in the language, and 
# language cannot exist without it. It can be discovered, but not invented.
#                                                             --- Charlton Laird

### NLP task: cleaning

You may use `for` loops to clean up a paragraph.

In [None]:
cat = ("A cat is a small furry animal with sharp claws and a long tail. "
       "They have soft fur that comes in different colors like black, white, or orange. "
       "Cats are known for their agility and ability to climb trees. "
       "They are independent creatures but can also be friendly and enjoy human companionship. "
       "They communicate through meowing and purring. "
       "Cats are often kept as pets and are loved for their playfulness and ability to catch mice.")

Task: Remove the punctuations.  

Note: This can be done easily by **regular expressions**.

In [None]:
punctuations = r"""!()-[]{};:'"\,<>./?@#$%^&*_~"""
for p in punctuations:
    cat = cat.replace(p, "")

cat

Remove functional words.

In [None]:
tokens = []
for word in cat.lower().split():
    if word not in ['a', 'is', 'with', 'and']: # add more 
        tokens.append(word)
        
tokens

Count the frequency.

In [None]:
count = 0
for tok in tokens:
    if tok == "cat" or tok == "cats":
        count = count + 1
        
count

### NLP task: discover word relations

In [None]:
new_york = ("New York is a bustling city in the United States. "
            "It is often called 'The Big Apple' and is full of excitement and opportunities. "
            "With its towering skyscrapers and busy streets, New York is a hub of new ideas and innovation. "
            "People from all over the world come to New York to chase their dreams and seek new adventures. "
            "The city never sleeps, and there is always something happening, from concerts and Broadway shows to festivals and parades. "
            "The streets are lined with shops, restaurants, and iconic landmarks like the Statue of Liberty and Times Square. "
            "New York is a melting pot of cultures, where you can taste new foods, hear new languages, and experience a vibrant mix of traditions. "
            "It's a city where old meets new, where history and modernity blend together, creating a unique and exciting atmosphere that captures the spirit of New York.")

One may use [pointwise mutual information](https://en.wikipedia.org/wiki/Pointwise_mutual_information) to decide whether two words always come together.

### Further reading

- [_A Whirlwind Tour of Python_](https://jakevdp.github.io/WhirlwindTourOfPython/) by Jake VanderPlas
- GeeksforGeeks [Python Tutorial](https://www.geeksforgeeks.org/python-programming-language/)