# Module 5: String Manipulation, File I/O, Jupyter Notebook

For our last module, we’ll be dealing with **string manipulation** and **file input and output operations**—all while **utilizing functions** more! I’ll also be introducing a powerful tool called the **Jupyter Notebook**.

> **IMPORTANT**: Make sure to click on **Run** for **_each_ Code cell** to see the output and introduce functions and variables to the notebook!


## String Indexing and Methods

Remember when I said **Strings** are also considered **sequential data structures**, since they are just **sequences of characters**? This factor allows users to **analyze and handle strings** in a versatile way: **character per character**, **segment per segment**. This process is called **string manipulation**.

There are a lot of things you could do with strings, not just in Python, but in other programming languages as well. 

One of the common things is the **letter case**, or the way strings are capitalized. Just by using the built-in methods, we can change the strings into **uppercase**, **lowercase**, and even **titlecase**—which isn’t easily doable in other languages! 

Let’s say we have a variable called **this_word** containing the **string** value **“hello Friend”**.

In [1]:
this_word = "hello Friend"

We can convert this into **uppercase**, **lowercase**, and **titlecase** by using the **upper()**, **lower()**, and **title()** functions simply by calling the variable name, followed by a dot or period, followed by the function name and the parentheses. Therefore, once we print these, the letter case will be different:

In [2]:
print(this_word.upper())
print(this_word.lower())
print(this_word.title())

HELLO FRIEND
hello friend
Hello Friend


Aside from this, we can **capitalize** the **first letter** of the **string**, and even **swap** cases! We have the **capitalize()** and **swapcase()** functions for this. With the same syntax, as the others, we can produce our desired result:

In [3]:
print(this_word.capitalize())
print(this_word.swapcase())

Hello friend
HELLO fRIEND


Now, remember when I said that **strings** are **immutable**? This means we **CANNOT manipulate strings** permanently **without** having to **reassign** them to the same or another **variable**. This means, using any of these functions **won’t change** the **current value** of the **this_word** variable.

In [4]:
print(this_word)

hello Friend


Now, if we’re going to **reassign** the **variable** to contain the **manipulated string**, that’s the only time the **output _will_ change**. This is **NOT** because we’re able to **change** the string, since strings are **immutable**. What we’re actually doing is we’re **creating a new value** by **reassigning** it to the **this_word** variable.

In [5]:
this_word = this_word.upper()
print(this_word)

HELLO FRIEND


Another common thing we can do with strings is **check** its **characteristics**! These built-in functions all return a **boolean** value of **True** or **False** depending on the characteristic you’re checking for.

First off the list are the **alphanumeric checkers**:
- **isalnum()** - function that checks if the string is composed of either **alphabets** _(from a to z)_,  **numbers** _(from 0 to 9)_, or a combination of **both**
- **isalpha()** - function that checks if the string is composed of **_only_ alphabet characters**
- **isdigit()** - function that checks if the string is composed of **_only_ numbers**

Now, let’s say we have the **this_word** variable with the string value **‘abc123’** and check its value using the three alphanumeric checker functions.

In [7]:
this_word = 'abc123'

print(this_word.isalnum())
print(this_word.isalpha())
print(this_word.isdigit())

True
False
False


As you can see, _only_ **isalnum()** returns **True**. This is because **isalpha()** detected **_digits_** in the string, while **isdigit()** detected **_letters_**! For the latter two to return True, the string should be either **purely letters** or **purely digits**.

Second off the list are the **letter case checkers**: **istitle()**, **isupper()**, and  **islower()**.

Basically, these three functions check if the string is in either **Titlecase**, **UPPERCASE**, or **lowercase**. Now, if we go ahead and use these checker functions on our **lowercase_sentence** variable, but this time with the value **“this is a sentence.”** in lower caps:

In [8]:
lowercase_sentence = "this is a sentence."

print(lowercase_sentence. istitle())
print(lowercase_sentence.isupper())
print(lowercase_sentence.islower())

False
False
True


Only **islower()** returns **True** since the string is neither in **Titlecase** nor in **UPPERCASE**.

Aside from this, we also have functions to check for certain characters in a string: **isspace()**, **endswith()**, and **startswith()**. Just like the rest of the built-in checker functions, this is quite self-explanatory so let’s proceed with an example to understand it better:

In [10]:
print(lowercase_sentence.isspace())
print(lowercase_sentence.endswith('.'))
print(lowercase_sentence.startswith('T'))

False
True
False


As you can see in the **endswith()** and **startswith()** functions, we’re passing the value we’re looking for at the **beginning** and **end** of the **string**. The **isspace()** and **endswith()** functions return **True** since the string **has spaces** and it also **ends with** a **period**. The **startswith()** function, however, returns **False** since the string starts with **_lowercase_ “t”**, **NOT capitalized**. So bear in mind when using **startswith()** and **endswith()** that the **parameter** is **case sensitive**.

Now, did you know that there are other functions available to help you check or find characters in a string? We have the built-in function **find()**. This function lets you **find "substrings”** or a **subset of characters** regardless of its position in the string. It then proceeds to return the **index** of its **first occurrence** if it finds that substring. Let’s try this function to find the substring **“is”** in our **this_word** variable!

In [9]:
this_word = "this is a sentence. this is another sentence!"
print(this_word.find("this"))

0


See how it returned **0**? This is because it’s the **starting index** of the first **“this”** in the **this_word** variable. Let’s verify this by printing the sentence **from index 0 onwards**. Let’s go ahead and type in the name of the variable, which is **this_word**, followed by **square brackets**. Inside the square brackets, let’s type the index returned which is **0**, followed by a **colon**. 

In [3]:
print(this_word[0:])

this is a sentence. this is another sentence!


See how the string printed started with the _first_ **"this”** occurrence? Now, let’s tweak the values inside the square brackets a bit by adding **4** right **after** the **colon**.

In [4]:
print(this_word[0:4])

this


Now we’re just seeing **“this”**! What we just did is called **“slicing”**. We’re using the **indexes** to **extract substrings**. The syntax inside the square brackets denotes the characters to extract. When we say **[0:4]**, we’re actually telling Python to just **extract** the characters **index 0 to 3**. Why does it say **4**,  then? This is because **4** actually marks the **beginning** of the **_excluded_ indexes**. Therefore, when we say **[1:3]**, we’re actually saying **from index 1, _excluding_ 3 onwards**.

But what about **[0:]**? This simply tells Python to extract the characters starting from **index 0 onwards**. We can also declare **[:5]**, which is the complete **opposite**. It will extract from the **beginning** of the **string**, **_excluding_ index 5 onwards**. Now, what if we simply declare **[5]**? Can you guess what will happen based on how we use indexes in lists? What will happen is it will print the **character** on **index 5**. Nothing more, nothing less.

Lastly, we can also use **negative digits** as **indexes** when **slicing** or extracting **substrings**! What this does is it **parses** the characters **from the end**, not the beginning. Therefore, if we use **[-1]**, it will print the **_last_ character** of the **string**.

In [26]:
print(this_word[-1])

!


Now, going back to the **find()** function: what if the character we’re looking for is **NOT** in the substring? What the function will do is return **-1**. This is a **_constant_ return value** so we can use this as a checker through conditions!

In [6]:
if this_word.find("was") == -1:
   print("There’s no ‘was’ in the string!")

There’s no ‘was’ in the string!


Lastly, let’s proceed with string manipulation using the **replace()**, **strip()**, and **split()** functions! Again, the function names are self-explanatory so let’s dive right into examples!

Using **replace()**, we can replace all occurrences of **“this”** with **“that”** in the **this_word** variable by passing the substring to replace, and the **new substring** _inside_ the **parentheses** of the **replace()** function.

In [10]:
print(this_word.replace("this", "that")) 

that is a sentence. that is another sentence!


Now, if we want to make this change **permanent**, since strings are **immutable**, we need to **reassign** the **value** to the **this_word** variable.

In [11]:
print(this_word)
this_word = this_word.replace("this", "that")
print(this_word)

this is a sentence. this is another sentence!
that is a sentence. that is another sentence!


The **strip()** function, on the other hand, helps us get rid of **trailing characters** or **whitespaces** whenever needed! This is particularly useful for file operations, which we’ll get into in a minute. For example, we have a variable named line with a lot of trailing whitespaces.

In [12]:
line = "     XX  This is a sentence with unnecessary whitespaces. XX                  "
line = line.strip()
print(line)

XX  This is a sentence with unnecessary whitespaces. XX


By using **strip()** _without_ a **parameter**, we were able to get rid of the **trailing whitespaces**. Now, let’s **remove** the **“XX”** markings at the beginning and end of the string by passing it to the **strip()** function as a **parameter**!

In [13]:
line = line.strip("XX").strip()
print(line)

This is a sentence with unnecessary whitespaces.


Amazing, right? Lastly, we have the **split()** function. This converts the string into a list of strings based on the **separator value** we’re passing as a **parameter**. Therefore, if we use **“Hello Python”** as a string value, we can store each word in a list by using the **split()** function with the **space** serving as the **separator value**.

In [14]:
line = "Hello Python"
print(line.split(" "))

['Hello', 'Python']


And just like that, we learned how to do string manipulation using various built-in Python functions!

## File Operations

When we say **File I/O**, we’re referring to **file input and output** operations, meaning: you can **read and write files** in Python! First, let’s walk you through the file opening concepts.

Python comes with **“access modes”** when it comes to opening files. These are denoted by **specific symbols** which needs to be passed along with the filepath when opening a file:
- The **read** mode, denoted by **‘r’**, which is the **default access mode** when opening files.
- The **write** mode, denoted by **‘w’**. which should let you **overwrite** the content of an existing file. 
  - Otherwise, it will **create a new file** for you if the **filepath** specified **does not exist**.
- The **append** mode, denoted by **‘a’**. which should let you add data to the file **without overwriting** its current content.
- The **‘r+’** and **‘a+’** access modes, which should let you **read and write** or **read and append** respectively.

To open a file in Python, we’ll use the syntax: **with open() as variable_name**. This takes care of **_safely_ opening and closing files** for you to **avoid corrupting** them in any way. Simply type in with open followed by an open and close parentheses and a colon. Inside the parentheses, you should specify the filepath or filename, optionally followed by the access mode. 

Now, let’s try to create our first file as an example! For the filename, let’s name it **“example.txt”**. For the access mode, let’s use the write mode. For the variable name, we’ll simply use file. Again, just like if-else statements, loops, and functions, be mindful of indentations!

For the content, let’s just add **“Hello World”** and **“Hello Python”**. There are two ways to do this:
- **write()** - This function is best used for **single line** values
- **writelines()** - This function is useful for **multiple line** values

In this case, since we have two lines to add, let’s use **writelines()**. We could declare each line simply by having it inside the parentheses, separated by commas. To spice things up a bit, let’s wrap this in a function named **file_writer** to make this file writer reusable!

In [15]:
def file_writer(filepath, lines_to_write):
    with open(filepath, 'w') as file:
        file.writelines(lines_to_write)


greetings = ["Hello World\n", "Hello Python\n"]
file_writer("example.txt", greetings)

For **\n**, it simply tells Python to create a **new line** at the end. Once we run your Python script, you should see a **new file** in the **same folder where this _.ipynb_ file is saved** named **“example.txt”**, containing:

>Hello World<br>Hello Python

Now, if we want to just add new data to an already existing file, we can use the **append** mode. For this, let’s create a function named **file_appender()**:

In [16]:
def file_appender(filepath, lines_to_write):
    with open(filepath, 'a') as file:
        file.writelines(lines_to_write)


farewell_greetings = ["Goodbye Friend\n", "Goodbye Python\n"]
file_appender("example.txt", farewell_greetings)

Once we save your Python script again and check **“example.txt”**, it should now contain:

>Hello World<br>
Hello Python<br>
Goodbye Friend<br>
Goodbye Python

Now, let’s proceed with **opening** and **reading** files. Python comes with three built-in functions to read files:
- **read()** - function simply extracts the entire content of the file
- **readline()** - function that only extracts one line from the file at a time
- **readlines()** - require the use of loops due to its nature; function that stores each line as an element of a list

> **NOTE:** The **readline()** and **readlines()** functions require the use of loops due to their nature.

Let’s try using the **readlines()** function to store **each line** into a **list**. For this, we don’t really need to pass an access mode anymore since the **read** mode is the **default access mode** of the **open()** function. This time, we’ll wrap the code in a new function named **file_reader()**.

In [17]:
def file_reader(filepath):
    with open(filepath) as file:
       return file.readlines()


line_list = file_reader("example.txt")
print(line_list)

['Hello World\n', 'Hello Python\n', 'Goodbye Friend\n', 'Goodbye Python\n']


See how it retained **\n** when it read the file? Let’s try to get rid of this using the **strip()** function! Let’s update the function to iterate through the list and display each line.

In [22]:
def file_reader(filepath):
    with open(filepath) as file:
        line_list =  file.readlines()

    for line in line_list:
        print(line.strip("\n"))


file_reader("example.txt")

Hello World
Hello Python
Goodbye Friend
Goodbye Python


Once we run this, we should see the output looking just like how it looks like on the **.txt** file!

And just like that, we already know how to **open**, **write**, and **read** files in Python! This doesn’t only work for **.txt** files but also for **_other_ file extensions** such as **CSV (comma-separated values)** files. For CSV files, you need to import a **_built-in_ library** named **“csv”**. Using the **import** keyword, you can import any **library** you need, including **third-party libraries** as long as you have it **installed**! I’ll walk you into the **package installation** process in a minute.

For this, I have here a sample CSV file named **“sample.csv”** containing random subjects and sample final grades.

In [25]:
import csv

with open('sample.csv', newline='') as csv_file:
    csv_data = csv.reader(csv_file)
    
    for row in csv_data:
        print(row)

['Subject', 'Final Grade']
['Filipino', '99']
['Language', '91']
['MAPEH', '93']
['TLE', '87']
['History', '89']
['Science', '90']
['Mathematics', '88']
['Computer', '88']


Once we save and run this, you should see a **list** containing the values specified **per row** on the CSV file. Amazing, isn’t it? There’s so much more to explore with the csv library depending on your needs. Simply go to the [Python documentation for the csv library](https://docs.python.org/3/library/csv.html) to know more about the other functions available. And just important to add: being inquisitive about documentation isn’t just for this library! In order for you to understand and utilize libraries at their best, **please make it a habit to check out their documentation**.


## Package Installation and Jupyter Notebook

Now, before we end this module, let’s walk you through **Jupyter Notebook** and **package installation**!

The **Jupyter Notebook** provides you a front-end interface for your code. Meaning, you can add content, write, and execute code within this notebook! It’s just like a word document, but much more interactive! This is really useful for collaborative activities and reporting.

First, let’s show you what it looks like! Let’s go to **[jupyter.org](https://jupyter.org)**, scroll down until you see **“Jupyter Notebook”**, and click on **“Try it in your browser”** and then click on **“Try Classic Notebook”**. Once it loads up, it should give you an idea of what a Jupyter Notebook would look like! It even has a tutorial ready for you. You can deep dive into that if you want, but I’ll also walk you through the installation process.

Now that you’ve seen it, let’s install it by clicking on **“Terminal”** on your **PyCharm EDU IDE**. Type in:

`pip install notebook`

And simply hit enter. **This is the standard syntax when installing any third-party Python library!** The PyPi website also shows this command [at the top of each library page](https://pypi.org/project/notebook/) for easy reference.

After installing, type in:

`jupyter notebook`

on your Terminal and hit enter. It should open a tab in your browser with the same interface as the one we saw a while ago! As you can see, it’s displaying the files inside your ** current PyCharm project directory** . Now, to open and explore the complementary Jupyter Notebooks I’ve created for you, simply upload them by clicking on the **“Upload”** button. Make sure that the files you’re choosing have the extension **.ipynb**.

Lastly, just to give you an idea on how to **create your own Jupyter Notebook**, click on **New**, and then choose **Python 3 (ipykernel)**. This should open an autosaving **Untitled.ipynb** file. You can easily rename the notebook by double clicking on the name at the header. Let’s rename this **“My First Notebook”**. Now, see how it says `In []:` at the left side of the interface? This means that the current line is for **Python code**. If you want to start with some **text**, let’s just select **“Markdown”** on the **dropdown** at the **top**. To use **headings** for titles, simply use the **hash (#)** sign at the **beginning** of the **text**. The **more hash signs** at the beginning, the **smaller** the **heading** gets.

To **edit**, simply **double click** again to activate the editor for that portion of the notebook. By default, **Jupyter Notebook** produces a **new editor for Python code** once you click on the **Run** button, so if you need another Markdown editor, simply change it using the dropdown again. Otherwise, we can go ahead and put some code:

`print("This is a Python code!")`

Once you click on the **Run** button, it will show the **output** at the **bottom**! See how convenient this is? 
If you want to **save** and **leave the interface**, click on the **File** button, then click on **“Save and Checkpoint”**. After that, click on **“Close and Halt”** to go back to the **Jupyter directory**. To close the directory and stop it from running, click on **“Quit”** at the **top right** corner. After that, you’ll see this message saying:

> You have shut down Jupyter. You can now close this tab.
To use Jupyter again, you will need to relaunch it.

Now that you know how to open and create files on Jupyter **Notebook**, I believe you’re now ready to go and explore on your own!


## Off to the bonus module (optional)

In this module, we learned how to do string manipulation, file operations, as well as how to install and use Jupyter Notebook! If you'd like to explore basic data analysis using Pandas, check out the ipynb file for Module 6, our bonus complementary module.

Congratulations for reaching the end of the course!

## Useful Materials

- [**GeeksforGeeks - Python**](https://www.geeksforgeeks.org/python-programming-language/?ref=shm)
- [**Core Python - DZone Refcardz**](https://dzone.com/refcardz/core-python)
- [**LearnPython - Free Interactive Tutorial**](https://www.learnpython.org/)


## Exercise

1. Open **exercise.txt** using Python, which is located within the same folder as this file. Using loop/s and "\n", try to print the text the same way it's displayed in the file. Create a function for your file reader.
2. Check if the student IDs contain numbers. Create a function for your checker and use it in your file reader function.

In [4]:
# Enter your code here and click on Run to check the results





