# Python Next Steps - Strings and Lists

## Strings
Strings are sequences of characters whether letters, numbers or symbols.  They are enclosed in quotation marks.
We have already covered using + to concatenate and * to repeat strings.  This class expands on ways of working with strings.

### Escape Characters
The \ blackslash is used in combination with certain characters to form what are called escape characters. Escape characters are inserted directly into a string, but serve special purposes.  The following are a few useful ones to know:

**New line break - \n tells Python to insert a line break, allowing you to break up text as it prints**

In [1]:
print("these words\nwill print\nline\nby\nline")

these words
will print
line
by
line


**Tab - \t lets you insert tab length spaces.**

In [5]:
print("\tspacing can\tbe\teasier")

	spacing can	be	easier


**Quotes - \\' and \\" - if you want to force Python to allow a certain quote mark style in a string and for use with apostrophes.** 

In [7]:
print("Don\'t forget your \"escape characters.\"")

Don't forget your "escape characters."


**Backslash - \\ for when you actually need a backslash in the string, helpful when dealing with paths or urls in strings.**

In [9]:
print("c:\\Home\\Backslash\\")

c:\Home\Backslash\


### Length
Determining the length of a string can be done with the len() function.  This is a built in function that works on other data types as well.

In [2]:
a = "Here is a string"
len(a)

16

Dont forget that spaces are characters too!

## String Functions

In the intro class we briefly touched upon the fact that each data type has functions of their own.  These functions are built into the 
object classes for each data type.  Because these functions belong to the object, in this case strings, they use a slightly different syntax
than we have used so far, sometimes called "dot notation." The string or variable name is followed by a period then the function call.  These types of functions are also referred to as "methods."  

### Strings are immutable
Don't forget that string objects can't be changed.  There are various reasons Python has implemented strings in this fashion.  Methods that make apparent alterations actually create new string objects.  These will need to be saved in a variable to preserve them for future use or simply used in the context of another function.  Running these replace functions will not change the original string.  That is why we can call all of these functions without seeing any cumulative changes.

In [13]:
a.upper() #the upper and lower methods are useful for normalizing user input for internal use

'HERE IS A STRING'

In [14]:
a.lower()

'here is a string'

And there are several other capitalization functions: 
capitalize() for capitalizing the first word of a string
title() for capitalizing each word
swapcase() for inverting the case of each letter in a string

In [33]:
a.split() #the default is to split at each space character

['Here', 'is', 'a', 'string']

Splitting a string results in a list.  We will cover things you can do with a list later.  Do note that splitting a single word string or an entire string into individual letters requires another technique.

In [35]:
a.split("i") #you can specify where the split happens, though that character will not appear in the resulting string

['Here ', 's a str', 'ng']

using .split(",") could be useful if you have imported data in the form of comman separated values.

In [3]:
"-".join(a) #note that the joining character comes first.  Technically this is calling the join method on the string "-", not variable a

'H-e-r-e- -i-s- -a- -s-t-r-i-n-g'

In [30]:
a.replace("Here", "This")

'This is a string'

In [31]:
a.replace(" ", "") #You can replace spaces too, this trick essentially removes all internal spaces

'Hereisastring'

In [36]:
a.replace(" ", "", 1) #takes an optional 3rd parameter for the max number of replacements to perform

'Hereis a string'

In [56]:
x = "\tHere is a string with extra whitespaces\t\t"
print(x)

	Here is a string with extra whitespaces		


In [57]:
nospace = x.strip() #strip() removes whitespace from the begining and/or end of the string
print(nospace)

Here is a string with extra whitespaces


Just to be complete, there are also lstrip() and rstrip() functions that remove whitespace only from the left or right sides.
For all three strip methods you can also pass in an argument if you want to remove some other character at the edges.  This could even be part of the string that is at one end.

In [59]:
print(nospace.rstrip("ces"))

Here is a string with extra whitespa


## String Indexing

There will be times when you want to act upon a specific character of a string.  Each character in a string can be individually accessed via an index number.  Python uses zero based indexing, so all counting begins
with the index number of 0.  Also note that all characters, including spaces and punctuation marks form part of the count.

To access or specify a certain character you use bracket notation.  The string or the variable name is followed by [].
Inside the bracket, you will enter the index number of the character you want.  This sub-string object could be saved in a new variable or used within a function.

In [4]:
print(a[0]) #this will print the first character in variable a

H


You can also access characters using negative index numbers.  When doing so, -1 will be the index of the last character in the string,
-2 is the next to last, and so on.  This is useful when you know you want the end of a string, but don't know how long it is or will be.

In [39]:
print(a[-1])

g


If you overshoot the actual length of the string and give an index number that is too high or too low, you will receive an error.

In [40]:
print(a[20])

IndexError: string index out of range

The above technique will return the value at an index.  However, it can also be useful to get the index of a known value.

In [64]:
print(a)
print(a.find("is")) # will return the index where this sub-string begins
print(a.find("error")) # will return -1 if value is not found in the string

Here is a string
5
-1


In [65]:
print(a)
print(a.index("is")) # acts almost identically to find()
print(a.index("error")) #except gives an error if value is not found in the string

Here is a string
5


ValueError: substring not found

Keep in mind these index finding tools start at the beginning and stop at the first occurrence of the value sought.  To find a later occurrence or all occurrences will require other techniques.

### Slicing

Using the same bracket notation, you can create a range of characater indexes that will give you a "slice" or sub-string of the original.
Inside the brackets enter the first index you want followed by a colon.  Then enter the index of one character past where you want to end.  Like the range() function, the second number is a stop number and isn't actually included in the result.
Absorbing the concepts of 0 based indexing and this use of stop numbers tend to be important hurdles for beginners.

In [43]:
print(a[5:7]) #returns the third through 9th characters.

is


If you don't specify a starting index, it begins at 0

In [44]:
print(a[:7])

Here is


If you don't specify a stop number, it will slice to the end.

In [45]:
print(a[7:])

 a string


If you leave both sides of the colon blank, you get the whole string

In [46]:
print(a[:])

Here is a string


Negative numbers can be used with slicing too.  Using a negative start number can get you a substring off the end when you don't know
how long your string really is or will be.  The last character is at index -1, the second to last at -2, and so on.  This can feel confusing when you have just learned to start counting from the left with 0.

In [48]:
print(a[-6:]) #starts 6 characters from the end and then goes to the end

string


Using negative stop numbers works too.  This will stop the slice at some point before the end of the string

In [49]:
print(a[-6:-3]) #gives the sixth to last through the fourth to last since -3 is the stop number.

str


In [80]:
print(a[:30]) #unlike accessing individual characters, using an index number outside the length of the string will not error.

Here is a string


As with the range function, index slices can take a third number, indicating the step.  It will begin with the first index given, then moves the step number of index places until it gets to or passes the stop number.

In [74]:
b = "Here is a new string"
print(b[::2]) # starts at index 0 and then goes through the entire string, giving every other character
print(b[1::2]) #will start at the second character and every other character from that point

Hr sanwsrn
eei  e tig


Don't forget that spaces are indexed characters too.  If you really just want every other letter, you will need another technique.  There are multiple ways of doing this, but a simple one based on what we've covered is to first use the replace() function to replace spaces " " with empty strings "".  Then take a slice of that new, space free string.  Because of the object oriented nature of Python, you can do it in one step.

In [75]:
b.replace(" ", "")[::2]

'Hriaesrn'

One more trick is reversing a whole string using a negative step number. Leave the first two numbers out and give it a -1 step.

In [54]:
print(b[::-1])

gnirts wen a si ereH


Howevever, you can also use negative start, stop and step numbers to get a reversed sub-string.  Just remember that the character at the stop number index isn't included in the result and a negative step number is required to move backwards.

In [76]:
print(b[-1:-7:-1]) #gives the last 6 characters in reverse order.
print(b[-1:-7]) #doesn't work, but doesn't give an error.  It returns an empty string

gnirts



Index numbers can also be generated by functions

In [72]:
print(b[b.find("is"):]) #slices a string starting with the index number returned from a find() function

is a new string


## in and not in

The keyword "in" or combined with the boolean operator "not", can be used to test if a string contains a specific character or substring.
The result of such an expression will be a boolean, True or False.

In [9]:
my_string = "Hello World!"

h = "Hello" in my_string
n = "something"in my_string

print(h, n)

True False


## String Formatting
There are a number of techniques related to creating and printing strings to include values from variables or lists.  The preferred method has changed as newer versions of Python have been released.  Currently "f strings" are in fashion.  They syntax is fairly simple and is at least helpful to recognize if you should run across it.

There is a letter f just outside the opening quotation mark.  Any place inside the string can have an external value inserted by entering the variable name or function inside {} brackets.  In this way, text can easily be changed based on different input.

In [73]:
name = "David"
word = "welcome"
print(f"Hello, {name}, you are {word}")

Hello, David, you are welcome


## Lists

Lists are another sequence data type.  You will also sometimes see them called arrays.  They are used to collect multiple values or "items." These items don't have to be the same kind of data.  It could contain any assortment of numbers, letters, strings, and more.

The key syntax is that a list is contained in square brackes [] just like the string slice notation and the items are comma separated.

In [18]:
my_list = ["dog", "cat", 1.5, 7, "human"]
print(my_list)


['dog', 'cat', 1.5, 7, 'human']


A list can exist with any number of items, including no items.  Lists with no items are called an empty list.  Sometimes you will need to create these to be filled later by other code.

In [97]:
empty = []
print(empty)

[]


The len() function works the same with lists to give you the number of items.  Concatenation with + and repetition with * do too.  Remember when concatenating the objects must be of the same data type, so anything you want to add this way must be contained in a list too, even single items.

In [68]:
print(len(my_list))

print(my_list + ["one more item"])

print(my_list * 3)

5
['dog', 'cat', 1.5, 7, 'human', 'one more item']
['dog', 'cat', 1.5, 7, 'human', 'dog', 'cat', 1.5, 7, 'human', 'dog', 'cat', 1.5, 7, 'human']


Like strings, the items in lists are indexed, so can be accessed and sliced just like string using the same notation.
Slice notation works the same as well.

In [64]:
print(my_list[2])

1.5


In [65]:
print(my_list[:4])

['dog', 'cat', 1.5, 7]


In [66]:
print(my_list[-1])

human


Unlike strings, lists are mutable.  Changes to the list change the original object.  For instance, you can directly reassign the value at a given index.

In [11]:
my_list[2] = "bird"

print(my_list)

['dog', 'cat', 'bird', 7, 'human']


### List Functions
List have their own functions or methods to accomplish all sorts of tasks.  The same dot notation is used as with strings.  Just keep in mind that these will change the original list, even if you try to assign them as values to a new variable.  Also remember they don't work on strings.

**append()** is used to add items to the end of the list.  Pass the item you want to add as the argument.

In [85]:
my_list.append(10)

print(my_list)

['dog', 'cat', 'bird', 'bird', 7, 'human', 10, 10]


**insert()** is used to add items at a particular index but without replacing what is currently there.  Other items are just pushed down one index place.

In [86]:
my_list.insert(2, "fish") #the first argument is the index to place it, the second is the item to add

print(my_list)

['dog', 'cat', 'fish', 'bird', 'bird', 7, 'human', 10, 10]


**pop()** is used without any argument to remove the last item from the list.  Unlike the previous methods, this function also returns something in addition to altering the list.  It returns the removed value, so you could store it in a variable, etc. too.  However, you can also pass pop() a specific index number and it will remove and return that value.

In [16]:
print(my_list)
item = my_list.pop()
print(item)
print(my_list)


['dog', 'cat', 1.5]
1.5
['dog', 'cat']


**remove()** also removes items from a list, but instead of an index number, you pass it the value you want removed.  This is helpful when you don't know where in the list a specific value is located.  It will remove
the first instance of the value you specify.  It does not return that value like pop().  Trying to remove an item that doesn't exist in the list will result in an error.

In [19]:
my_list.remove("cat")
print(my_list)

['dog', 1.5, 7, 'human']


The **del** keyword (short for delete) can be used to remove values from a list by index number.  It can be used to delete a whole slice.  Without an index or slice, it removes the entire list.  This isn't actually a list method, but a built in keyword, so the syntax is different.

In [90]:
print(my_list)

['dog', 'fish', 'bird', 7, 'human', 10]


In [92]:
del my_list[0]
print(my_list)

['bird', 7, 'human', 10]


When used to delete the list, it doesn't just remove the items, leaving an empty list.  It deletes the entire variable. It can be used to delete variables containing other data types too.  It cannot delete immutable objects like strings directly, but it can delete a variable containing one.

In [96]:
del my_list
print(my_list)

NameError: name 'my_list' is not defined

The next two functions don't alter a list but return something from it.

**index()** is used to return the index number of a given value.

In [99]:
num = my_list.index("dog")

print(num)

0


**count()** returns the number of times a value appears in a list.

In [100]:
dogs = my_list.count("dog")

print(dogs)

1


As we have seen, most list methods alter the original list object.  When you want to preserve the original, you need to create a copy.
Then you can work on the original or the copy, leaving the other alone.  However, you can't just assign the value of the original list's variable to a new variable.  This works for strings and integers, but not lists.  Once a list is created, any new variables assigned to that same value will point to the same place in system memory.  Those new variable names would essentially be aliases for the same piece of data.

In [105]:
print("original list: ",my_list)
copy_list = my_list
copy_list.append("new item")
print("original list now: ",my_list)


original list:  ['dog', 'cat', 1.5, 7, 'human']
original list now:  ['dog', 'cat', 1.5, 7, 'human', 'new item']


**copy()** luckily this function exists to create a true, independant copy of the list.  Be sure to store it in its own variable.

In [106]:
copy_list = my_list.copy()
copy_list.append("another new item")
print("my_list: ",my_list)
print("copy_list: ",copy_list)


my_list:  ['dog', 'cat', 1.5, 7, 'human', 'new item']
copy_list:  ['dog', 'cat', 1.5, 7, 'human', 'new item', 'another new item']


**sort()** sorts list items in alpha or numerical order.  However, it won't work with lists that mix numbers and strings.

In [107]:
my_list.sort()
print(my_list)

TypeError: '<' not supported between instances of 'float' and 'str'

In [108]:
new_list = ["dog", "human", "cat", "fish"]
new_list.sort()
print(new_list)

['cat', 'dog', 'fish', 'human']


If you want to sort the list while leaving the original alone, you can use copy(), however there is another built in function that does the job.
**sorted()** returns a new list sorted, so you can also assign it to a new variable.  Because it isn't a list method, the syntax is different.

In [113]:
numlist = [23, 45, 10, 5]
x = sorted(numlist)

print("original list: ",numlist)

print("x =",x)

original list:  [23, 45, 10, 5]
x = [5, 10, 23, 45]


**reverse()** is a list method.  It reverses the order of the original list.  Use copy() first if you want to keep the original.
You could also assign a reversed slice of the entire list to a new variable.

In [115]:
revlist = x.copy()
revlist.reverse()
print(revlist)

[45, 23, 10, 5]


In [116]:
print(x[::-1])

[45, 23, 10, 5]


## Looping Through Lists
Now you can combine what we know about strings, lists and loops to perform a wider array of tasks.

In [23]:
a_string = "Hello world my name is David!"

In [26]:
split_string = a_string.split()
print(split_string)

['Hello', 'world', 'my', 'name', 'is', 'David!']


In [119]:
short_words = [] #creating an empty list that the loop will fill.

for x in split_string:
    if len(x) < 3: #finds all words in the list less than 3 characters long
        short_words.append(x) #appends those words to the list we created above
        
print(short_words)

['my', 'is']


In [29]:
rev_words = []
for x in split_string:
    rev_words.append(x[::-1]) #appends each words in split_string, but individually reversed, to the new rev_words list
    
print(" ".join(rev_words)) #creates a string from the rev_words list.

olleH dlrow ym eman si !divaD


### Splitting a string into characters/letters
To split a single word string into a list of characters, you can use the built in list() function


In [21]:
letters = list("abcdefg")
print(letters)

['a', 'b', 'c', 'd', 'e', 'f', 'g']


### Splitting numbers into digits
Integers cannot be directly split into a list of digits like a string.  Strings are "iterable" objects, meaning each value or character is something that can be individually accessed and looped over.  Integers are designed to represent full, sigular values and are not treated as iterable.

In [2]:
nums = 122432523

In [3]:
numlist = list(nums)
print(numlist)

TypeError: 'int' object is not iterable

However, you can easily convert an integer into a string using the str() function, then use the list function.  Below we have nested the two together.  The inner most str() function returns a string first, then the outer list() function splits it into a list.

In [4]:
numlist = list(str(nums))
print(numlist)

['1', '2', '2', '4', '3', '2', '5', '2', '3']


Convert them back into integers using a loop or list comprehension:

In [5]:
intlist = []
for x in numlist:
    intlist.append(int(x))
print(intlist)

[1, 2, 2, 4, 3, 2, 5, 2, 3]


In [7]:
intlist = [int(x) for x in numlist]
print(intlist)

[1, 2, 2, 4, 3, 2, 5, 2, 3]
