# Python Next Steps - Strings and Lists

## Strings
Strings are sequences of characters whether letters, numbers or symbols.  They are enclosed in quotation marks.
We have already covered using + to concatenate and * to repeat strings.  This class expands on ways of working with strings.

### Escape Characters
The \ blackslash is used in combination with certain characters to form what are called escape characters. Escape characters are inserted directly into a string, but serve special purposes.  The following are a few useful ones to know:

**New line break - \n tells Python to insert a line break, allowing you to break up text as it prints**

In [3]:
print("these words\nwill print\nline\nby\nline")

these words
will print
line
by
line


**Tab - \t lets you insert tab length spaces.**

In [5]:
print("\tspacing can\tbe\teasier")

	spacing can	be	easier


**Quotes - \\' and \\" - if you want to force Python to allow a certain quote mark style in a string and for use with apostrophes.** 

In [7]:
print("Don\'t forget your \"escape characters.\"")

Don't forget your "escape characters."


**Backslash - \\ for when you actually need a backslash in the string, helpful when dealing with paths or urls in strings.**

In [9]:
print("c:\\Home\\Backslash\\")

c:\Home\Backslash\


### Length
Determining the length of a string can be done with the len() function.  This works on other data types as well.

In [18]:
a = "Here is a string"
len(a)

16

## String Functions

In the intro class we briefly touched upon the fact that each data type has functions of their own.  These functions are built into the 
object classes for each data type.  Because these functions belong to the object, in this case strings, they use a slightly different syntax
than we have used so far.  They are also referred to as "methods."  Here are some important ones to know:

In [13]:
a.upper() #the upper and lower methods are useful for normalizing user input for internal use

'HERE IS A STRING'

In [14]:
a.lower()

'here is a string'

In [33]:
a.split() #the default is to split at each space character

['Here', 'is', 'a', 'string']

Splitting a string results in a list.  We will cover things you can do with a list later.

In [35]:
a.split("i") #you can specify where the split happens, though that character will not appear in the resulting string

['Here ', 's a str', 'ng']

In [32]:
"-".join(a) #note that the joining character comes first.  Technically this is calling the string method on "-", not variable a

'H-e-r-e- -i-s- -a- -s-t-r-i-n-g'

In [30]:
a.replace("Here", "This")

'This is a string'

In [31]:
a.replace(" ", "") #You can replace spaces too, this trick essentially removes all internal spaces

'Hereisastring'

In [36]:
a.replace(" ", "", 1) #takes an optional 3rd parameter for the max number of replacements to perform

'Hereis a string'

In [28]:
x = "\tHere is a string with extra whitespaces\t\t"
print(x)

	Here is a string with extra whitespaces		


In [29]:
x.strip() #strip() removes whitespace from the begining and/or end of the original string

'Here is a string with extra whitespaces'

In Python, strings are immutable.  This means that once created, they cannot be changed.  This is done to keep memory allocation and
other background processes easier and consistent.

The methods we just covered do not change the original string.  They are actually returning a new string object.  Because the original is left alone, we were able to call all of these methods on variable a without having to deal with the cumulative changes.  You would have to assign these values to their own variables to keep them.


## String Indexing

Each character in a string can be individually accessed via an index number.  Python uses zero based indexing, so all counting begins
with the index number of 0.  Also note that all characters, including spaces and punctuation marks form part of the count.

To access or specify a certain character you can use bracket notation.  The string or the variable name is followed by [].
Inside the bracket, you will enter the index number of the character you want.

In [38]:
print(a[0]) #this will print the first character in variable a

H


You can also access characters using negative index numbers.  When doing so, -1 will be the index of the last character in the string,
-2 is the next to last, and so on.  This is useful when you know you want the end of a string, but don't know how long it is or will be.

In [39]:
print(a[-1])

g


If you overshoot the actual length of the string and give an index number that is too high or too low, you will receive an error.

In [40]:
print(a[20])

IndexError: string index out of range

### Slicing

Using the same bracket notation, you can create a range of characater indexes that will give you a "slice" or substring of the original.
Inside the brackets enter the first index you want followed by a colon.  Then enter the index of one character past where you want to end.  Like the range() function, the second number is a stop number and isn't actually included in the result.

In [43]:
print(a[5:7]) #returns the third through 9th characters.

is


If you don't specify a starting index, it begins at 0

In [44]:
print(a[:7])

Here is


If you don't specify a stop number, it will slice to the end.

In [45]:
print(a[7:])

 a string


If you leave both sides of the colon blank, you get the whole string

In [46]:
print(a[:])

Here is a string


Negative numbers can be used with slicing too.  Using a negative start number can get you a substring off the end when you don't know
how long your string really is or will be.

In [48]:
print(a[-6:]) #starts 6 characters from the end and then goes to the end

string


Using negative stop numbers works too.  This will stop the slice at some point before the end of the string

In [49]:
print(a[-6:-3])

str


As with the range function, index slices can take a third number, indicating the step.

In [53]:
b = "Here is a new string"
print(b[::2]) # starts at index 0 and then goes through the entire string, but 2 characters at a time.

Hr sanwsrn


One more trick is reversing a whole string. Leave the first two numbers out and give it a -1 step.

In [54]:
print(b[::-1])

gnirts wen a si ereH


## in and not in

The keyword in or combined with the boolean operator not, can be used to test if a string contains a specific character or substring.
The result of such an expression will be a boolean, True or False.

In [58]:
my_string = "Hello World!"

h = "Hello" in my_string
n = "something"in my_string

print(h, n)

True False


## Lists

Lists are another sequence data type.  You will also sometimes see them called arrays.  They are used to collect multiple values or "items." These items don't have to be the same kind of data.  It could contain any assortment of numbers, letters, strings, and more.

The key syntax is that a list is contained in square brackes [] just like the string slice notation and the items are comma separated.

In [98]:
my_list = ["dog", "cat", 1.5, 7, "human"]
print(my_list)


['dog', 'cat', 1.5, 7, 'human']


A list can exist without any items.  It is just called an empty string.  Sometimes you will need to create these to be filled by other code.

In [97]:
empty = []
print(empty)

[]


The len() function works the same with lists to give you the number of items.  Concatenation with + and repetition with * do too.  Remember when concatenating the objects must be of the same data type.

In [68]:
print(len(my_list))

print(my_list + ["one more item"])

print(my_list * 3)

5
['dog', 'cat', 1.5, 7, 'human', 'one more item']
['dog', 'cat', 1.5, 7, 'human', 'dog', 'cat', 1.5, 7, 'human', 'dog', 'cat', 1.5, 7, 'human']


Like strings, the items in lists are indexed, so can be accessed and sliced just like string using the same notation.
Slice notation works the same as well.

In [64]:
print(my_list[2])

1.5


In [65]:
print(my_list[:4])

['dog', 'cat', 1.5, 7]


In [66]:
print(my_list[-1])

human


Unlike strings, lists are mutable.  Changes to the list change the original object.  You can directly reassign the value at a given index.

In [84]:
my_list[2] = "bird"

print(my_list)

['dog', 'cat', 'bird', 'bird', 7, 'human', 10]


### List Functions
List have their own functions or methods to accomplish all sorts of tasks.  The same dot notation is used as with strings.

**append()** is used to add items to the end of the list.  Pass the item you want to add as the argument.

In [85]:
my_list.append(10)

print(my_list)

['dog', 'cat', 'bird', 'bird', 7, 'human', 10, 10]


**insert()** is used to add items at a particular index but without replacing what is currently there.  Other items are just pushed down one index place.

In [86]:
my_list.insert(2, "fish") #the first argument is the index to place it, the second is the item to add

print(my_list)

['dog', 'cat', 'fish', 'bird', 'bird', 7, 'human', 10, 10]


**pop()** is generally used to remove the last item from the list.  This function also returns that removed value, so you can store it in a variable too.  However, you can also pass it a specific index number and it will remove and return that value instead.

In [87]:
item = my_list.pop()
print(my_list)
print(item)

['dog', 'cat', 'fish', 'bird', 'bird', 7, 'human', 10]
10


**remove()** also removes items from a list, but instead of an index number, you pass it the value you want removed.  It will remove
the first instance of the value you specify.  It does not return that value like pop().

In [89]:
my_list.remove("cat")
print(my_list)

['dog', 'fish', 'bird', 7, 'human', 10]


**del** the del keyword (short for delete) can be used to remove values from a list by index number.  It can be used to delete as whole slice.  Without an index or slice, it removes the entire list.  This isn't actually a list method, but a built in keyword, so the syntax is different.

In [90]:
print(my_list)

['dog', 'fish', 'bird', 7, 'human', 10]


In [92]:
del my_list[0]
print(my_list)

['bird', 7, 'human', 10]


When used to delete the list, it doesn't just remove the items, leaving an empty list.  It deletes the entire variable.

In [96]:
del my_list
print(my_list)

NameError: name 'my_list' is not defined

**index()** is used to return the index number of a given value.

In [99]:
num = my_list.index("dog")

print(num)

0


**count()** returns the number of times a value appears in a list.

In [100]:
dogs = my_list.count("dog")

print(dogs)

1


As we have seen, most list methods alter the original list object.  When you want to preserve the original, you need to create a copy.
Then you can work on the original or the copy, leaving the other alone.  However, you can't just assign the value of the original list's variable to a new variable.  Once a list is created, any variables assigned to it will point to the same place in system memory.

In [105]:
print("original list: ",my_list)
copy_list = my_list
copy_list.append("new item")
print("original list now: ",my_list)


original list:  ['dog', 'cat', 1.5, 7, 'human']
original list now:  ['dog', 'cat', 1.5, 7, 'human', 'new item']


**copy()** luckily this function exists to create a true, independant copy of the list.  Be sure to store it in its own variable.

In [106]:
copy_list = my_list.copy()
copy_list.append("another new item")
print("my_list: ",my_list)
print("copy_list: ",copy_list)


my_list:  ['dog', 'cat', 1.5, 7, 'human', 'new item']
copy_list:  ['dog', 'cat', 1.5, 7, 'human', 'new item', 'another new item']


**sort()** sorts list items in alpha or numerical order.  However, it won't work with lists that mix numbers and strings.

In [107]:
my_list.sort()
print(my_list)

TypeError: '<' not supported between instances of 'float' and 'str'

In [108]:
new_list = ["dog", "human", "cat", "fish"]
new_list.sort()
print(new_list)

['cat', 'dog', 'fish', 'human']


If you want to sort the list while leaving the original alone, you can use copy(), however there is another built in function that does the job.
**sorted()** returns a new list sorted, so you can also assign it to a new variable.  Because it isn't a list method, the syntax is different.

In [113]:
numlist = [23, 45, 10, 5]
x = sorted(numlist)

print("original list: ",numlist)

print("x =",x)

original list:  [23, 45, 10, 5]
x = [5, 10, 23, 45]


**reverse()** is a list method.  It reverses the order of the original list.  Use copy() first if you want to keep the original.
You could also assign a reversed slice of the entire list to a new variable.

In [115]:
revlist = x.copy()
revlist.reverse()
print(revlist)

[45, 23, 10, 5]


In [116]:
print(x[::-1])

[45, 23, 10, 5]


## Looping Through Lists
Now you can combine what we know about strings, lists and loops to perform a wider array of tasks.

In [117]:
a_string = "Hello world my name is David!"

In [118]:
split_string = a_string.split()
print(split_string)

['Hello', 'world', 'my', 'name', 'is', 'David!']


In [119]:
short_words = []

for x in split_string:
    if len(x) < 3:
        short_words.append(x)
        
print(short_words)

['my', 'is']


**Another Example**

In [125]:
rev_words = []
for x in split_string:
    rev_words.append(x[::-1])
    
print(" ".join(rev_words))

olleH dlrow ym eman si !divaD
