# MTH4000 Programming in Python I - Lecture 8
Module organisers Dr Matthew Lewis and Prof. Thomas Prellberg

## github.com

You may wonder why we asked you to create a GitHub account. In a nutshell, GitHub is a very useful tool to store your programming work (and other information), and to share, and even collaborate on projects. While a detailed description of GitHub is beyond the scope of this lecture (such as version control features), I want to show you a bit of what one can do within this environment.

You have all been asked to create a repository "MTH4000" in your GitHub account. I will show you now how to 
- create a new Jupyter Notebook file,
- edit its content,
- and even run the code it contains.

For example, the repository that I created for this module can be found directly under the GitHub account https://github.com/T-Prellberg by adding the repository name, i.e. https://github.com/T-Prellberg/MTH4000.

![image.png](attachment:image.png)

Note that everyone can see this repository, as it is public. Once I log in, I get the option to modify the content.

![image-2.png](attachment:image-2.png)

You can do exactly the same with your own repository in your own account.

### Creating or uploading a new file.

It is fairly easy to create and edit a text file. We can change the content of the README file by selecting the pencil icon, for example. When doing so, we see a commit button appear. Once we are done with any changes, we need to **commit** the updated file to the repository:

![image.png](attachment:image.png)

Once committed, we see that the README message in the repository has changed.

![image.png](attachment:image.png)

If we want to create a new file, I simply click on "Add file", highlighted above, which gives me an option to create a new file or upload files. If I create a new file, I can then directly choose a file name and enter text.

![image.png](attachment:image.png)

Again, I must not forget to **commit** any changes to the repository once I am done. Afterwards my new file shows in the repository:

![image.png](attachment:image.png)

The main drawback is that this direct creation and editing only really works with simple text files. If we try to create a blank jupyter notebook in this way, we would not get very far. Let's create a blank file with an .ipynb extension.

![image.png](attachment:image.png)

If we now try to view this, we are told that there is a problem:

![image.png](attachment:image.png)

If you ever looked at a notebook with a text editor, you might already understand why. Even a blank notebook needs to contain some minimal text. For example, a freshly created notebook in our Jupyter Environment looks like this:

```apl
{
 "cells": [],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 5
}
```

We can add this text by hand and the "Invalid Notebook" message disappears. We actually see nothing, because this is now just an empty notebook!

![image.png](attachment:image.png)

An easier option to get started is to upload an existing notebook. Let's upload the prepared file `example.ipynb`. This works with a simple drag-and-drop, followed by a **commit**. We now have four files in our repository.

![image.png](attachment:image.png)

The notebook contents are viewable as well.

![image.png](attachment:image.png)

Selecting "Code" or trying to edit this file gives us the raw text, however. So how do we edit (and run?) notebooks like we are used to from Jupyter? For this, we introduce https://github.dev.

## github.dev

The previous section was mainly to show you some of the basic functionality of github. To really work with jupyter notebook files, we need to move from `github.com` to `github.dev`. 

There are easy ways to switch from a repository in github.com to github.dev:
1. typing "." or ">" opens the repository in the same tab or in a new tab, respectively
2. replacing "github.com" by "github.dev" in the url opens the repository in `github.dev`

![image.png](attachment:image.png)

Note that when editing on `github.dev` it is absolutely important to save your work by **regular commits**. The `github.dev` editor runs in your browser and stores your work in the browser's local storage only.

We can now open files by clicking on them on the left and easily edit them. Note that we can add code and markdown boxes, and whatever we enter is displayed in a way similar to what we are used to from Jupyter.

![image.png](attachment:image.png)

Note that it comes with a wealth of helpful popup boxes, designed to support your coding.

![image.png](attachment:image.png)

Importantly, when you are done, you again must **commit** your changes to the repository by selecting the source control icon.

![image.png](attachment:image.png)

You commit the changes and "push" them to the repository on github. Note that this time you need to enter some text (think of it as a comment to yourself of what you have just done). Let's have a look at the repository on github.com:

![image.png](attachment:image.png)

Note that my comments appear next to the file (the last one was "added some code boxes"). And if we view `example.ipynb`, we now see the updated version.

![image-2.png](attachment:image-2.png)

The file has indeed been updated as intended! We repeat: when editing on `github.dev` it is absolutely important to save your work by regular commits. The `github.dev` editor runs in your browser and stores your work in the browser's local storage only.

## Running Python Code

`github.dev` itself does not run code in notebooks. In order to do so, you need to connect a jupyter notebook application running elsewhere. You may already have noticed the "Select Kernel" option above, we now go through how to achieve this by connecting `github.dev` to a kernel running locally on your machine. 

(This doesn't work if you are using AppsAnywhere, you need to select "github codespaces" which creates and connects to a virtual Kernel and you end up doing cloud-based computing.)

Here we will go with what you are familiar with already.

This might sound complicated, but is actually quite easy, as you are already familiar with the mechanics of the first step. In the IT labs or on AppsAnywhere, open your black Python window. (If you run Anaconda on your own computer, you need to find "Anaconda prompt" among your applications to open that window, and you can do just the same.)

Instead of just typing `jupyter notebook`, you now need to run the following command
```
 jupyter notebook --no-browser --NotebookApp.allow_origin_pat=https://.*vscode-cdn\.net
```
This starts a notebook server without opening a browser window, and allows `github.dev` to connect to that server. The important bit is to find the lines showing URLs similar to this:

![image.png](attachment:image.png)

Now we move back to `github.dev` in the browser window and find the "Select Kernel" option.

#### First time installation

Doing this the first time it gets a bit tricky. One way that worked for me was to try to run a code box. You will then be prompted to choose a Kernel source to install.

![image-2.png](attachment:image-2.png)

Selecting "Install/Enable suggested extension Python + Jupyter" will give you popup windows. Unfortunately these contain some warnings, but simply select "Install Anyway".

![image.png](attachment:image.png)

Once this is done (and you should only have to do this once for your repository), you can move on to the Kernel selection itself.

#### Kernel selection

Now the above prompt will give you to choose an existing Jupyter Server (the one you just started on your machine).
![image.png](attachment:image.png)

Selecting this, you are given the option to enter the URL from the black Python window.
![image.png](attachment:image.png)

Paste the URL here (you'll have a different token, of course)
![image.png](attachment:image.png)

Next, connect to the server and hit enter again to confirm your input.
![image.png](attachment:image.png)

Finally we're in business and can indeed select the Kernel!

![image.png](attachment:image.png)

#### Running code

Note how `github.dev` now shows more details, indicating that I indeed can communicate with my locally running python kernel and can run the notebook as I wish to.

![image.png](attachment:image.png)

When I click "Run All", for example, I get the desired result. Once done, I must not forget to **commit** any changes to my repository. Once this is done, I see the updated notebook on github.com, including the results from the "Run All".

![image.png](attachment:image.png)

**Important:** In the lab this week, you will be asked to create some content on github.com (using github.dev). During the test, you again need to enter the URL of the repository you worked on. In my case, this would be https://github.com/T-Prellberg/MTH4000.

## Strings, File I/O, and all that

So far, we have used Python as an isolated sandbox to work in. What is sorely missing is a way to interact with the external world, i.e. to be able to input and output data. In this next part we will look how Python deals with data input and output, reading, creating, and writing text. 

In [1]:
# we haven't run any Python yet, so let's not forget to import numpy
import numpy as np

## Intermezzo: Copying lists (and other composite data)

As we are now going to deal more with composite data such as lists, we need to have a closer look at how Python stores data. Lets consider the following code, in which we have created a list `A` and assigned `A` to `B`.

In [2]:
A=[1,2,3,4]
B=A
print(A,B)

[1, 2, 3, 4] [1, 2, 3, 4]


Lets now modify `A` and `B` by changing some list entries.

In [3]:
A[0]=-1
B[3]=-4
print(A,B)

[-1, 2, 3, -4] [-1, 2, 3, -4]


You see that the change in `A` has affected `B` and the change in `B` has affected `A`. This is because `A` and `B` are actually the identical list: when `B` is created by the statement `B=A` Python does not actually copy the whole content of `A` and assigns that to `B`, but simply ensures that `A` and `B` refer to the **same place** in the computer memory. The function `id()` makes this visible - it returns a number representing the location of that memory.

In [4]:
print(id(A),id(B))

1180183840576 1180183840576


This is different from simple, non-composite data (i.e. data that is not a collection of other things, but single integers, floats, complex numbers, etc.), which does get copied over when you create a new variable. In this case, `b` remains unaffected when you change the content of `a`.

In [5]:
a=10.0
b=a
a=2
print(a,b)

2 10.0


Now `a` and `b` are stored in different places of the computer memory.

In [6]:
print(id(a),id(b))

140733649818440 1180181293584


The reason for this is efficiency. Most of the time when dealing with really large objects (such as a 1000x1000 matrix with a million entries) you want to avoid creating copies of whole chunks of memory.

### Equality versus object identity

The above is the reason why in the table of comparisons I showed you a few weeks ago there was an entry called object identity, comparing objects using 
```python 
a is b
```
If you have been thinking along, you may already be able to guess what the difference between `a is b` (identity) and `a==b` (equality) is.

Lets create two separate lists with equal entries:

In [7]:
C=[1,2,3]
D=[1,2,3]
print ('C is D:', C is D)
print ('C == D:', C==D)

C is D: False
C == D: True


This means that in this case, even though the content of `C` equals the content of `D`, `C` and `D` are actually different lists, in that they are stored in different places in the memory.

In [8]:
print(id(C),id(D))

1180183731456 1180183731776


Unfortunately, there is an apparent anomaly when cosidering simple data types. Lets look again at the example above, but check for object identity.

In [9]:
a=10.0
b=a
print(a is b)
a=2
print(a is b)

True
False


This looks like some inconsistency, as I have just told you that `a` and `b` are stored in different locations of computer memory. I really need to be more precise: what is stored in computer memory is actually the value `10.0`, and `a` and `b` simply point to where this value is stored. Hence `a is b` is still true after the assignment `b=a` and becomes false once `a` is modified using `a=2`. Now the value `2` is stored elsewhere in the computer memory and `a` now points to where that value is stored.

Let's give another example of this.

In [10]:
x=997
print(id(x))
x=x+1
print(id(x))
y=997
print(id(y))

1180181295088
1180181294928
1180181293872


The value 997 is stored at the address XYZ and `x` points to that address. When `x` is increased by 1, the value 998 is stored at the address PQR and `x` now points to that new address. Subsequently, when `y` is assigned the value 997, Python creates a new instance of the value 997 at yet another address UVW (Note: these addresses will be different for each of you, as this depends on the computer you use as well as any other programs you are running concurrently.) 

Finally, some often used values might be recycled, as the following example shows, where 997 has been replaced by 1.

In [11]:
x=1
print(id(x))
x+=1
print(id(x))
y=1
print(id(y))

140733649818408
140733649818440
140733649818408


Note that now `y` points to the same location as `x` did initially.

You don't really need to know all the intricate details of this section, but you need to be aware of it so you can avoid making mistakes such as trying to copy data by assignment. We'll deal with how to do this correctly in the next section.

### Creating copies of lists

Trying to copy lists (and other composite data) by assignment is a frequent programming mistake made by beginners. If you really want to create a true copy of data, and if `B=A` does not work for that, what can you do?

There are actually a few ways you can help yourself. The "correct" way in Python 3 is to use the method `.copy()` for duplicating data.

In [12]:
A=[1,2]
B=A.copy()
print ('A is B:', A is B)
print ('A == B:', A==B)

A is B: False
A == B: True


There are two other ways you will sometimes see in written code. Slicing, for example, creates a copy, so slicing with `[:]` will work.

In [13]:
C=[1,2]
D=C[:]
print ('C is D:', C is D)
print ('C == D:', C==D)

C is D: False
C == D: True


And finally, using the function `list()` on a list also creates a copy.

In [14]:
C=[4,5]
D=list(C)
print ('C is D:', C is D)
print ('C == D:', C==D)

C is D: False
C == D: True


## Strings

A string in Python is a sequence of characters. We have already informally encountered strings in many places, for example when we used the `print()` function to give out text strings such as 
```python
print("exact root found")
```
or when we used strings to input data, or when we used document strings in functions.

And you will have noticed that Python allows different delimiters of strings: \',",\'\'\', and """ are all allowed, with the latter allowing a string to extend across multiple lines and include linebreaks, as when we wrote document strings in functions.

And one of your lecturers prefers `"hello"`, whereas the other prefers `'hello'`.

In [15]:
print("Hello")
print()
print('Hello')
print()
print('''Hello''')
print()
print("""Hello 
World!""")
type("Hello")

Hello

Hello

Hello

Hello 
World!


str

Any character can appear in a string, but if we want to include special characters in a string while writing Python code, we sometimes need to use special character sequences using a backslash `\` to include it in a string.

In [16]:
print('\'Hello\'') #single quotes \'
print("'Hello'")
print()
print('Hello\tWorld')  #tab \'
print()
print('Hello\nWorld')  #new line \n
print()
print('Hello\\World')  #backslash \\
print()
'Hello\nWorld' # note that the string looks different when not printed

'Hello'
'Hello'

Hello	World

Hello
World

Hello\World



'Hello\nWorld'

In [17]:
'Hello\nWorld'[5] # newline is a single character

'\n'

### Indexing strings

Strings behave like other iterables, i.e. we can access single characters and substrings by indexing.

In [18]:
my_string="Hello World!"
print(my_string[6])
print()
print(my_string[0:5])
print()
print(my_string[-6:-1])
print()
print(my_string[::-1])
print()
print(my_string[2:9:2])

W

Hello

World

!dlroW olleH

loWr


You also have already noticed that we can loop over strings without having to convert them into lists first.

In [19]:
for letter in my_string:
    print(letter)

H
e
l
l
o
 
W
o
r
l
d
!


However, strings are immutable, i.e. we cannot change parts of a string once it is defined, so they behave more like tuples than lists.

In [20]:
#this will produce an error
#print(my_string)
#my_string[6]='w'

Once a string has been assigned, there is basically only one way to remove it, using the keyword `del` (which can be used to remove anything else we have defined as well).

In [21]:
#print(my_string)
#del my_string
#print(my_string)

### String operations

Strings can be concatenated with `+`, and muliplication by an integer $n$ using `*` gives $n$-fold concatenation.

In [22]:
print('Hello '+'World!')
print(3*'Hello ')
print('Hello '*3) # order doesn't matter
#print('Hello'*'Hello') # this does not work

Hello World!
Hello Hello Hello 
Hello Hello Hello 


Functions we already used with other enumerables work with strings as well.

In [23]:
print(len('Hello'))
print(list(enumerate('World')))
print(list(reversed('Today')))

5
[(0, 'W'), (1, 'o'), (2, 'r'), (3, 'l'), (4, 'd')]
['y', 'a', 'd', 'o', 'T']


### String methods

There are many ways to change strings in Python. A good listing of all the methods can be found [here](https://www.programiz.com/python-programming/strings-method). You may have to use this quite a few times when working with strings in the lab. I will just give you a few examples here.

In [24]:
my_string="hello world!"
print(my_string)
print(my_string.capitalize())
print(my_string.center(80,"*"))
print(my_string.split())
print(my_string.split('o'))

hello world!
Hello world!
**********************************hello world!**********************************
['hello', 'world!']
['hell', ' w', 'rld!']


The last method `.split()` is a good way to break down longer text strings into their parts. It takes as optional argument a separator string (the default is any empty space), so if you work with a \*.csv file, you can split using a comma as separator string (csv is one format that Excel can understand; the acronym means "comma separated values").

In [25]:
print("123.45,38.9,Vauxhall".split()) # does not split anything, there's no space
print("123.45,38.9,Vauxhall".split(',')) # splits into comma separated values

['123.45,38.9,Vauxhall']
['123.45', '38.9', 'Vauxhall']


### Formatting

Formatting is a great way to produce good looking output. You can put placeholders `{}` into a string which are being filled by the arguments of `.format()`. The placeholders can contain formatting specifications, which is very useful for formatting numbers. I will only give you a few examples here, you should consult [here](https://www.programiz.com/python-programming/methods/string/format) if you need to know more.

In [26]:
print("Hello {} world!".format("happy"))
print("Binary representation of {0} is {0:b}".format(12))
print("Exponent representation: {0:e}".format(1566.345))
print("One third is: {0:.3f}, and one quarter is: {1:.3f}".format(1/3,1/4))

Hello happy world!
Binary representation of 12 is 1100
Exponent representation: 1.566345e+03
One third is: 0.333, and one quarter is: 0.250


For those of you who know the programming language C, there is also a way to use C-style formatting. Let's compare both:

In [27]:
x=np.pi
print('The value of Pi is {0:4.2f} rounded to two decimal places.'.format(x))
print('The value of Pi is %4.2f rounded to two decimal places.' %x)

The value of Pi is 3.14 rounded to two decimal places.
The value of Pi is 3.14 rounded to two decimal places.


The most modern formatting is by means of something called [f-string](https://docs.python.org/3/tutorial/inputoutput.html) (f because it precedes the string to be used in the formatting). It's best seen by example:

In [28]:
print(f'The value of Pi is {x:4.2f} rounded to two decimal places.')

The value of Pi is 3.14 rounded to two decimal places.


It's easiest to read as the variable inserted into the string appears right where it is used. But when reading code, you will find all three ways, depending on when the code was written and if the programmer is used to writing C code.

And finally, there's a cumbersome way of constructing strings by hand.

In [29]:
print('The value of Pi is '+str(round(x,2))+' rounded to two decimal places.')

The value of Pi is 3.14 rounded to two decimal places.


## File Input/Output

### Opening and closing files

Files are named locations on external memory where data is stored. To access a file for reading or writing we need to open it, and when we are finished we need to close it again.

In Python, this is done with the built-in function `open()` and the method `.close()`. The function `open()` returns a file object, which we assign to a variable to be able to access it for read or write operations.

In [30]:
f=open("test.txt",'w') #open file in current directory
### perform file operations
f.close()

The default mode is to open the file for reading. Other modes can be specified when opening a file. Here, we shall use `r` (read, default), `w` (write), and `a` (append). For other modes, see [here](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files):

`r` opens a file for reading (default). If the file does not exist, the operation fails.

`w` opens a file for writing. If the file exists, it gets overwritten. If the file does not exist, it gets created.

`a` opens a file for appending. If the file exists, anything written to it gets appended to the end of the file. If the file does not exist, it gets created.



The way data gets stored in a file is platform dependent. If you want files you create to be readable on different platforms, you need to specify the encoding explicitly. I suggest you use
```python
f=open("test.txt",'r',encoding='utf-8')
```

If an open file does not get closed correctly, data in it might become corrupted. (This is why you should never just switch off a computer, but rather shut it down.) When developing a code, this can happen if the code you are writing contains errors and execution of the code gets terminated before the .close() is executed. It is therefore safer to open a file using the `with` statement as follows.
```python
with open("test.txt") as f:
    ### perform file operations
```
In this way, the `.close()` method is performed internally when the code block inside `with` is exited.

### Reading and writing files

Files essentially contain text strings called lines. `readlines()` returns the whole file as a list of strings, and `writelines(lines)` writes `lines` to the file, where `lines` is a list of strings. Similarly, `readline()` reads a single line, and `write(line)` writes the string `line` to the file.

Lines are slightly different from strings in that they normally contain a newline character at their end (unless it is the last line of a file, in which case this may be missing).

 Lets look at an example.

In [31]:
data=['Professor Thomas Prellberg\n',
       'Professor of Mathematics\n',
       'Email: t.prellberg@qmul.ac.uk\n',
       'Telephone: +44 (0)20 7882 5490\n',
       'Room Number: Mathematics Building, Room: MB523\n',
       'Website: http://www.maths.qmul.ac.uk/~tp/\n']
with open("test.txt",'w') as f:
        f.writelines(data)

The above writes the content of `data` to a file named test.txt. Lets read this file and check its content.

In [32]:
with open("test.txt",'r') as f:
    data1=f.readlines()
print(data1)

['Professor Thomas Prellberg\n', 'Professor of Mathematics\n', 'Email: t.prellberg@qmul.ac.uk\n', 'Telephone: +44 (0)20 7882 5490\n', 'Room Number: Mathematics Building, Room: MB523\n', 'Website: http://www.maths.qmul.ac.uk/~tp/\n']


In [33]:
data1==data

True

If the newline character is missing, strings are concatenated when using `writelines()`.

In [34]:
office_hours=['Learning Cafe: Thursdays 2:00-3:00 ',
              'during term times, also by email appointment\n']
with open("test.txt",'a') as f:
    f.writelines(office_hours)
with open("test.txt",'r') as f:
    data1=f.readlines()
print(data1)

['Professor Thomas Prellberg\n', 'Professor of Mathematics\n', 'Email: t.prellberg@qmul.ac.uk\n', 'Telephone: +44 (0)20 7882 5490\n', 'Room Number: Mathematics Building, Room: MB523\n', 'Website: http://www.maths.qmul.ac.uk/~tp/\n', 'Learning Cafe: Thursdays 2:00-3:00 during term times, also by email appointment\n']


Note that the appended two strings have been joined to one line.

A file itself can be used as an iterable, i.e. we can go through all the lines in a file in a for loop, which is very convenient!

In [35]:
with open("test.txt",'r') as f:
    for line in f:
        print(line)

Professor Thomas Prellberg

Professor of Mathematics

Email: t.prellberg@qmul.ac.uk

Telephone: +44 (0)20 7882 5490

Room Number: Mathematics Building, Room: MB523

Website: http://www.maths.qmul.ac.uk/~tp/

Learning Cafe: Thursdays 2:00-3:00 during term times, also by email appointment



Note how here the print statement produces blank lines due to the presence of newline character. Lets remove these. We can either remove the last character in each line using the index `[:-1]` or by using the method `.rstrip()`, which also removes trailing spaces.

In [36]:
with open("test.txt",'r') as f:
    for line in f:
        print(line.rstrip())

Professor Thomas Prellberg
Professor of Mathematics
Email: t.prellberg@qmul.ac.uk
Telephone: +44 (0)20 7882 5490
Room Number: Mathematics Building, Room: MB523
Website: http://www.maths.qmul.ac.uk/~tp/
Learning Cafe: Thursdays 2:00-3:00 during term times, also by email appointment


### Reading and writing numerical data

Now that we know how to read and write to files, there is one thing missing. We can only read and write text strings, and when we want to deal with numerical data we still need to do a conversion between strings and numbers. No matter which programming language you use, this can be quite cumbersome.

In [37]:
Pi=np.pi
with open("test.txt",'w') as f:
    f.write(str(Pi))
    #f.write(Pi) #produces error message
with open("test.txt",'r') as f:
    X=float(f.readline())
print("{0} == {1} is {2}".format(Pi,X,X==Pi))

3.141592653589793 == 3.141592653589793 is True


This looks easy but if someone gives you a file with numerical data a lot of stuff can go wrong. Lets say the data is in lines which contain spaces and also some other characters (comma, semicolon, etc) to separate the individual numbers. How can we deal with that? There are several ways to do this, such as parsing a text string using [regular expressions](https://en.wikipedia.org/wiki/Regular_expression) after importing the module [re](https://docs.python.org/3/library/re.html). I will not go into details about this here.

## Conclusion and Outlook

In this lecture we showed you how to use a github repository for editing notebooks and running Python code. We also introduced the mechanics of doing input/output with Python.