# More Python Basics

---
---

## Going Further with Python for Text-mining

This notebook builds on the [previous notebook](1-intro-to-python-and-text.ipynb) to teach you a bit more Python so you can understand the text-mining examples presented in the following notebooks.

These are the fundamentals in working with strings in Python and other basics that every Python user may use every day. However, this is just an introduction and it is not expected that you will be ready and capable of simply diving in to coding straight after completing this course.

Rather, these notebooks and the accompanying live teaching sessions are supposed to give you just a taster of what text-mining with Python is about. By the end of the course I hope you will come away with either: an interest to learn more; or equally valid, an informed feeling that coding is not for you.

Having said this, there are many approaches to learning programming, and it is often only once you happen upon the right approach for you that you make good progress. It is worth trying different topics, teachers, media and learning styles. I tried to learn programming several times over many years and eventually found the right course that kickstarted my own coding journey.

---
---

## Recap of Strings
Welcome back! Here's a quick recap of what we learnt in [1-intro-to-python-and-text](1-intro-to-python-and-text.ipynb). Strings are the way that Python deals with text. 

Create a *string* and store it with a *name*:

In [None]:
my_sentence = 'The Moon formed 4.51 billion years ago.'
my_sentence

_Concatenate_ strings together:

In [None]:
my_sentence + " " + "It is the fifth-largest satellite in the Solar System."

_Index_ a string. Remember that indexing in Python starts at 0.

In [None]:
my_sentence[16]

_Slice_ a string. Remember that the slice goes from the first index up to but _not_ including the second index.

In [None]:
my_sentence[0:20]

Transform a string with _string methods_. Important: the original string `my_sentence` is unchanged. Instead, a string method _returns_ a new string.

In [None]:
my_sentence.swapcase()

Test a string with string methods:

In [None]:
my_sentence.islower()

Create a _list_ of strings:

In [None]:
my_list = ['The Moon formed 4.51 billion years ago',
           "The Moon is Earth's only permanent natural satellite",
          'The Moon was first reached in September 1959']
my_list

_Slice_ a list. Add a _step_ to jump through a string or list by more than one. Use a step of `-1` to go backwards. 

In [None]:
my_list[0:3:2]

---
---

## Create a List of Strings with List Comprehensions

Let's get going on some new material.

We can create new lists in a quick and elegant way by using _list comprehensions_. Essentially, a list comprehension _loops_ over each item in a list, one by one, and returns something each time, and creates a new list.

For example, here is a list of strings:

`['banana', 'apple', 'orange', 'kiwi']`

We could use a list comprehension to loop over this list and create a new list with every item made UPPERCASE. The resulting list would look like this:

`['BANANA', 'APPLE', 'ORANGE', 'KIWI']`

The code for doing this is below:

In [None]:
fruit = ['banana', 'apple', 'orange', 'kiwi']
fruit_u = [item.upper() for item in fruit]
fruit_u

The pattern is as follows:

`[return_something for each_item in list]`

First thing to say is the `for` and `in` are *keywords*, that is, they are special reserved words in Python. These must be present exactly in this order in every list comprehension.

The other words (`return_something`, `each_item`, `list`) are placeholders for whatever variables (names) you are working with in your case.

![List comprehensions diagram](assets/list-comprehension.png)

> Let's look at some of the details:
 * A list comprehension goes inside square brackets (`[]`), which tells Python to create a new list.
 * `list` is the name of your list. It has to be a list you have already created in a previous step.
 * The `each_item in list` part is the loop.
 * `each_item` is the name you assign to each item as it is selected by the loop. The name you choose should be something descriptive that helps you remember what it is.
 * The `return_something for` part is what happens each time it loops over an item. The `return_something` could just be the original item, or it could be something fairly complicated.

The most basic example is just to return exactly the same item each time it loops over and return all items in a list.

Here is an example where we have taken our original list `my_list` and created a new list `new_list` with the exact same items unchanged:

In [None]:
new_list = [item for item in my_list]
new_list

Why do this? There does not seem much point to creating the same list again. 

### Manipulate Lists with String Methods

By adding a string method to a list comprehension we have a powerful way to manipulate a list.

We have already seen this in the `fruit` example above. Here's another example of the same thing with the 'Moon' list we've been working with. Every time the Python loops over an item it transforms it to uppercase before adding it to the new list:

In [None]:
new_list_upper = [item.upper() for item in my_list]
new_list_upper

In [None]:
# Write code to transform every item in the list with a string method (of your choice)

Hint: see the [full documentation on string methods](https://docs.python.org/3.10/library/stdtypes.html#string-methods).

### Filter Lists with a Condition

We can _filter_ a list by adding a _condition_ so that only certain items are included in the new list:

In [None]:
new_list_p = [item for item in my_list if 'p' in item]
new_list_p

The pattern is as follows:

`[return_something for each_item in list if some_condition]`

![List comprehensions with condition diagram](assets/list-comprehension-with-condition.png)

Essentially, what we are saying here is that **if** the character "p" is **in** the item when Python loops over it, keep it and add it to the new list, otherwise ignore it and throw it away.

Thus, the new list has only two of the strings in it. The first string has a "p" in "permanent"; the second has a "p" in "September".

In [None]:
# Write code to filter the list for items that include a number (of your choice)

---
---
## Adding New Capabilities with Imports

Python has a lot of amazing capabilities built-in to the language itself, like being able to manipulate strings. However, in any Python project you are likely to want to use Python code written by someone else to go beyond the built-in capabilities. Code 'written by someone else' comes in the form of a file (or files) separate to the one you are currently working on.

An external Python file (or sometimes a *package* of files) is called a *module* and in order to use them in your code, you need to *import* it.

This is a simple process using the keyword `import` and the name of the module. Just make sure that you `import` something _before_ you want to use it!

The pattern is as follows:

`import module_name`

Here are a series of examples. See if you can guess what each one is doing before running it.

In [None]:
import math
math.pi

In [None]:
import random
random.random()

In [None]:
import locale
locale.getlocale()

The answers are: the value of the mathematical constant *pi*, a random number (different every time you run it), and the current locale that the computer thinks it is working in.

---
---
## Reusing Code with Functions

A function is a _reusable block of code_ that has been wrapped up and given a _name_. The function might have been written by someone else, or it could have been written by you. We don't cover how to write functions in this course; just how to run functions written by someone else.

In order to run the code of a function, we use the name followed by parentheses `()`. 

The pattern is as follows:

`name_of_function()`

We have already seen this earlier. Here are a selection of functions (or methods) we have run so far:

In [None]:
# 'lower()' is the function (aka method)
my_sentence = 'Butterflies are important as pollinators.'
my_sentence.lower()

In [None]:
# 'isalpha()' is the function (aka method)
my_sentence.isalpha()

In [None]:
# 'random()' is the function
random.random()

---
#### Functions and Methods
There is a technical difference between functions and methods. You don't need to worry about the distinction for our course. We will treat all functions and methods as the same.

If you are interested in learning more about functions and methods try this [Datacamp Python Functions Tutorial](https://www.datacamp.com/community/tutorials/functions-python-tutorial).

---

### Functions that Take Arguments
If we need to pass particular information to a function, we put that information _in between_ the `()`. Like this:

In [None]:
math.sqrt(25)

The `25` is the value we want to pass to the `sqrt()` function so it can do its work. This value is called an _argument_ to the function. Functions may take any number of arguments, depending on what the function needs.

Here is another function with an argument:

In [None]:
import calendar
calendar.isleap(2020)

Essentially, you can think of a function as a box. 

![Function black box diagram](assets/function-black-box.png)

You put an input into the box (the input may be nothing), the box does something with the input, and then the box gives you back an output. You generally don't need to worry _how_ the function does what it does (unless you really want to, in which case you can look at its code). You just know that it works.

> ***Functions are the basis of how we 'get stuff done' in Python.***

For example, we can use the `requests` module to get the text of a Web page:

In [None]:
import requests
response = requests.get('https://www.wikipedia.org/')
response.text[136:267]

The string `'https://www.wikipedia.org/'` is the argument we pass to the `get()` function for it to open the Web page and read it for us.

Why not try your own URL? What happens if you print the whole of `response.text` instead of slicing out some of the characters?

---
---
## Summary

Here's what we've covered - how to:

* Create and manipulate a new list with a **list comprehension**.
* Filter a list with a **condition**.
* **import** a **module** to add new capabilities.
* Run a **function** with parentheses.
* Pass input **arguments** into a function.