# Advanced Topics for Lecture 4

## Aliasing of Lists

An object with more than one variable names is called an **aliased** object. The effect of aliasing on a mutable data object, such as a list, is different from immutable objects, see the example below.

In [1]:
requirements = ['Python', 'C', 'Java']
skills = requirements           # Assignment implies aliasing
skills[1] = 'C++'               # Change the 2nd item of skills
print(requirements)             # The list of requirements is also changed
print(skills)

['Python', 'C++', 'Java']
['Python', 'C++', 'Java']


The code segment above shows that the variables `requirements` and `skills` are the same list (the same object), so if the content of one variable is changed (e.g. the data item is changed from "C" to "C++"), the change will be reflected on the other variable. 

You could also use the `is` operator, or the identities of variables to verify that they are the same object. 

In [2]:
requirements is skills

True

In [3]:
print(id(requirements))
print(id(skills))

4618228352
4618228352


A list can be imagined as a train, where the variable is like the locomotive, while the data items are like the cars of the train. The assignment statement `skills = requirements` does not create another train, instead, it created a new locomotive called `skills`, which is attached to the same train `requirements`. As a result, changes made on `requirements` (or `skills`) will also be observed on `skills` (or `requirements`), since they are the same train. 

If we want to create a list as a new object, we could use the `copy()` method of lists.

In [4]:
requirements = ['Python', 'C', 'Java']
skills = requirements.copy()    # Create a new copy of the old list
skills[1] = 'C++'               # Change the 2nd item of dsc2008
print(requirements)
print(skills)

['Python', 'C', 'Java']
['Python', 'C++', 'Java']


The `copy()` method creates a new copy of the original list, so `skills` and `requirements` are two different objects, hence the change on one list will not affect the other. 

In [5]:
skills is requirements

False

In [6]:
print(id(skills))
print(id(requirements))

4618306176
4618305920


Besides the `copy()` method, a new list will be created if any operations, such as concatenation, duplication, and slicing, are applied to the old list. Check the following examples.

In [7]:
requirements = ['Python', 'C', 'Java']
skills = requirements + []      # Create a new copy of the old list

print(requirements is skills)
print(id(requirements))
print(id(skills))

False
4618336704
4618308928


In [8]:
requirements = ['Python', 'C', 'Java']
skills = requirements * 1       # Create a new copy of the old list

print(requirements is skills)
print(id(requirements))
print(id(skills))

False
4618334336
4618228352


In [9]:
requirements = ['Python', 'C', 'Java']
skills = requirements[:]        # Create a new copy of the old list

print(requirements is skills)
print(id(requirements))
print(id(skills))

False
4618354432
4618354240


<div class="alert alert-block alert-danger">
<b>Notes:</b>  
    Though aliasing sometimes useful in programming, it is error-prone.  In general, it is safer to avoid aliasing when you are working with mutable objects, such as lists. 
</div>

Here are a few tricky examples of aliasing. 

In [10]:
ones = [1]*3
ones[0] = 2
ones

[2, 1, 1]

It is easy to find out the value of the list <code>ones</code> to be <code>[2, 1, 1]</code>, as the first item of <code>ones</code> is changed to be 2.  

How about the following code, where the items of <code>ones</code> are also lists?

In [11]:
ones = [[1]]*3
ones[0][0] = 2
ones

[[2], [2], [2]]

Surprisingly, all inner lists in <code>ones</code> are changed to be <code>[2]</code>. 

The results can be explained by how nested lists are stored in the memory. If we are using "trains" to represent the nested list <code>[[1]]</code>, then it is like the following picture. 

<img src="https://github.com/XiongPengNUS/test/blob/master/Screen%20Shot%202020-02-11%20at%2011.09.57%20PM.png?raw=true" width=600>

Please note that the item of the outer list is not the inner list (train) itself, but the "locomotive" of the inner train. This idea applies to all nested lists. 

As the list <code>[[1]]</code> is multiplied by three, we have the list item, the locomotive of the inner train, to be replicated three times, as shown by the picture below.

<img src="https://github.com/XiongPengNUS/test/blob/master/Screen%20Shot%202020-02-11%20at%2011.22.09%20PM.png?raw=true" width=600>

It can be seen that only the locomotive of the inner train is replicated, while the train itself remains the same. After the multiplication, list items <code>ones[0]</code>, <code>ones[1]</code>, and <code>ones[2]</code> are **aliases** of the same inner list <code>[1]</code>, or the same object. Check the following code.

In [12]:
print(ones[0] is ones[1])
print(ones[1] is ones[2])

True
True


As a result, if the item of the inner list is changed, this change would apply to all items, <code>ones[0]</code>, <code>ones[1]</code>, and <code>ones[2]</code>, of the outer list. 

Now check the exercise below.

In [13]:
avengers = ['Spiderman', ['Groot', 'Star-lord'], 'Thor', 'Hulk']
new_avengers = avengers.copy()
avengers[1][0] = 'Baby groot'
avengers[2] = 'Hawkeye'

Can you derive all items of the list <code>new_avengers</code>, before running the following code?

In [None]:
print(new_avengers)

## Lists of strings <a id="subsection2.5"></a>

When dealing with texts, we often need to look into each word, and Python provides convenient tools like `split()` that creates a list of all words, as demonstrated by the code cell below.

In [15]:
song = """
Hey Jude, don't make it bad.
Take a sad song and make it better.
Remember to let her into your heart,
Then you can start to make it better.
"""

words = song.split()
print(words)

['Hey', 'Jude,', "don't", 'make', 'it', 'bad.', 'Take', 'a', 'sad', 'song', 'and', 'make', 'it', 'better.', 'Remember', 'to', 'let', 'her', 'into', 'your', 'heart,', 'Then', 'you', 'can', 'start', 'to', 'make', 'it', 'better.']


The `split()` method splits a long string into smaller pieces using spaces as **delimiters**, i.e., characters to mark the beginning and end of a data piece. Since spaces are used as delimiters, each data item is a word, with punctuations included. In order to remove the punctuations, we can use the `replace()` method discussed in the previous section.

In [16]:
song_clean = song.replace(',', '')          # Remove all commas
song_clean = song_clean.replace('.', '')    # Remove all periods
words = song_clean.split()

print(song_clean)
print(words)


Hey Jude don't make it bad
Take a sad song and make it better
Remember to let her into your heart
Then you can start to make it better

['Hey', 'Jude', "don't", 'make', 'it', 'bad', 'Take', 'a', 'sad', 'song', 'and', 'make', 'it', 'better', 'Remember', 'to', 'let', 'her', 'into', 'your', 'heart', 'Then', 'you', 'can', 'start', 'to', 'make', 'it', 'better']


Besides splitting texts into lists of words, we can also combine a list a strings into a long string using the `join()` method. Check the code cell below.

In [17]:
words = ['AI', 'machine learning', 'analytics', 'prediction', 
         'inference', 'regression', 'optimization']

text = ' + '.join(words)
print(text)

AI + machine learning + analytics + prediction + inference + regression + optimization


In this example, the `join()` method concatenates all strings in the given list, where the `str` type object `' + '` is inserted in between each string piece. 