# 12 Strings
File(s) needed: Boston_housing_SMALL.csv, gettysburg.txt

Strings are a type of sequence, like lists and tuples. Many of the same operations work for them. The `len()` function is one that works on both and is regularly needed.

You can access the individual characters in a string in two ways: with a loop or an index. 

In [None]:
# Use a for loop to cycle through all of the characters in a string
# like elements in a list.
name = 'Eric Idle'


In [None]:
# Use indexing to reference each letter, also like a list.
# Negative index values work the same way, too.
# The len() function is also valid with strings.


## Concatenation
Concatenation is all about combining strings by using the `+` operator. In this case, that is the concatenation operator (not addition operator). 

We can also use the `+=` augmented assignment operator with strings. It provides a good way to built output or display strings.

In [None]:
# This is a common operation when displaying names.


Using the `+=` makes it look like you are changing the string, but you are not. **Strings are immutable**. Because of that, you can't use an index on the left side of an assignment operator.

In [5]:
# To confirm, see what happens if you try to change a letter in the string.
name[2] = 'z'

NameError: name 'name' is not defined

## String slicing
A **slice** is a span of items taken from a sequence, in this case from a string. General form: `string[start:end]` just like lists. The expression returns a string containing a copy of the characters from start up to but not including end.

You have all of the same functionality we talked about with lists.
- If you leave out the start index Python uses 0.
- If you leave out the end index Python goes to the end of the string.
- If you specify a negative number for the start index, Python will start that many positions before the end of the string.
- If you leave out both start and end (i.e., you just have [:]) the string is copied.
- You can also specify a step value to slice nonconsecutive characters.


In [None]:
# String slicing examples


## Testing and searching strings
Use the `in` and `not in` operators to test if one string is found in another.

In [4]:
# Example using in operator
text = 'Your mother was a hamster and your father smelt of elderberries.'



We can also test strings for various properties using built-in methods. They are methods because _**a string is an object**_, and objects have associated methods included in their definitions.

- isdigit() – used to see if a string is just numeric characters
- isalnum() – alphabetic letters or numbers
- isalpha() – just alphabetic letters
- islower() & isupper() – checks the case of alphabetic characters


In [None]:
# example with isdigit()


There are methods for searching for and replacing parts of strings as well. They are pretty self-explanatory.
- endswith(substring)
- find(substring)
- replace(old, new)
- startswith(substring)

We can also use the repetition operator (`*`) like we did with lists.

In [3]:
# Example: repetition operator
mascot = 'Bears'
print(mascot)
print(mascot * 5)


Bears
BearsBearsBearsBearsBears


String modification methods return a copy of the string (because they're immutable) that reflects the desired modification.
- lower() and upper() – change the entire string to the specified case
- strip() – removes all leading and trailing whitespace
- rstrip() & lstrip() – strip the whitespace from the right or left ends respectively
- strip(char) – strip all instances of the specified character from both ends.
- rstrip(char) & lstrip(char) – strips all instances of the specified character from the right or left ends respectively.


## Splitting strings
Python has a method called `split()` built into the string object type. We can use it to break a string into pieces and store the pieces in a list.

The default behavior is to use spaces as the separation point.

In [1]:
# Example: split on spaces

# The text string used before contains multiple words divided by spaces
text = 'Your mother was a hamster and your father smelt of elderberries.'

# Split the string and store the results in a list
words = text.split()

# Print the list to the screen.
print(words)


['Your', 'mother', 'was', 'a', 'hamster', 'and', 'your', 'father', 'smelt', 'of', 'elderberries.']


We can also specify a character (or characters) to use as a separator. The forward slash (`/`) is often used in dates, the hyphen (`-`) is used in phone numbers, and the comma (`,`) is used in CSV files. These are not the only possibilities, however. Any valid character(s) we need to use can be designated as a separator.

In [None]:
# Example: splitting date strings
date_string = '9/4/1981'

# Split the date


# Display each part of the date.


In [2]:
# Example: splitting text using word as a separator
myText = 'dogdudecatdudepigdudegoatdudellamadudeparrot'
myList = myText.split('dude')
print(myList)

['dog', 'cat', 'pig', 'goat', 'llama', 'parrot']


## Example

Following is an example that combines a couple of these operations to make data from a file readable. It uses a CSV file as input and writes the results to a text file.

We will use the comments already in the code as an outline for creating the program.

The first thing we should do before writing any code is to look at the data to see what we are working with.

In [None]:
# Example: Splitting strings and cleaning the results using a csv file.
# This example uses the file Boston_housing_SMALL.csv as input.

# Open source file and result file


# Read the first line of data containing the field names.
# Use the strip() method to make sure we don't get any
#   unwanted white space in the results.



# Loop through the list created by the split() method.
# Write variable names to the result file.


# Loop through the rest of the data, clean it, and write to the result file.


# Close the files




## Programming Exercise
Write a program that gets one string from the user containing a person's first, middle, and last names, and then displays their first, middle, and last initials (with periods). For example, if the user enters "john william smith" the program should display "J. W. S." You can assume the user types one space between names.

##### Add a degree of difficulty
Make your code work no matter how many spaces are input.

In [10]:
name = input("Please enter your full name: ")

words = name.split()

print(". ".join(word[0] for word in words) + ".")


Please enter your full name:Justin Case McDonald
J. C. M.


## Programming Exercise
Write a program that counts the number of sentences and words in President Lincoln's Gettysburg Address. Also calculate the average number of words per sentence. Display your results to the screen.

Use the file gettysburg.txt as input. 

In [40]:
file = open('gettysburg.txt')

num_words = 0
all_text = file.readline()
file.close()
 
sentences = all_text.split('. ')
print('Number of sentences: ', len(sentences))

for line in sentences:
    words = line.split()
    num_words += len(words)
    
print('Number of words: ', num_words)

print('Average words per sentence: ', num_words/len(sentences))


Number of sentences:  10
Number of words:  264
Average words per sentence:  26.4
