# String Manipulation

Now that we know about Lists, we can revisit strings and take a closer look at ways to use them.

Strings are really just a list of characters. For instance, the string "Hello World" is a list of 11 characters. Just like any other list, we can access elements of it by using their index. The following code should print the character at index 0 in "Hello World", which is "H":

In [0]:
my_string = "Hello World"
print(my_string[0])

### Concatenation

Strings can be added together just like lists. For example, if we take the string "Hello" and add it to the string "World":

In [0]:
my_string = "Hello" + "World"
print(my_string)

Notice that this doesn't add a space in between the two strings. You must include this yourself either at the end of the first string or at the beginning of the second string.


Strings also have a lot of built in functions in Python that make using them much easier:

### Count

count() will return the number of occurrences of a given string within your string. Let's say you need to count the number of times that "g" shows up in a string that the user inputs. The syntax for this is `string_name.count("g")`.

### Slicing

Slicing lets you take a small chunk of characters out of a string and store it in a new string. Syntactically, slicing is similar to indexing. The basic form is **[start : end]** where start and end are indexes in the string. For example, `my_string[0:3]` will give you a string of the first three characters in my_string. The following code should print "Doct":

In [0]:
my_string = "Doctor"
new_string = my_string[0:4]
print(new_string)

Notice that the string that is returned will include the character at the start index, but will only include up to the character at one less than the end index. Not passing in a value for start will automatically make it 0, and not passing in a value for end will automatically make it the last index in the string + 1.

The indexes can also be negative, so `my_string[-3:]` will return the last three characters of the string and `my_string[:-3]` will give you all but the last three characters of the string.

### Split

`split()` allows you to turn a string into a list of strings. The original string will be split apart at every occurrence of the string that is passed into split(). For example, the following code will print ["My", "dog", "is", "great"]

In [0]:
my_string = "My dog is great"
split_string = my_string.split(" ")
print(split_string)

### Capitalization
You can also play around with upper and lower case strings. Using the functions
* `upper()`
* `lower()`
* `title()`
* `capitalize()`
* `swapcase()`

You can try changing the string below to see how the function applies to difference strings.

In [0]:
string = "Hello World"
print(string.upper()) # "HELLO WORLD"
print(string.lower()) # "hello world"
print(string.title()) # "Hello World"
print(string.capitalize()) # "Hello world"
print(string.swapcase()) # "hELLO wORLD"

### Input Validation
You can also use functions to check for specific types of characters in a string. Here are some example functions:
* `isalnum()`
* `isalpha()`
* `isdigit()`
* `istitle()`
* `isupper()`
* `islower()`
* `isspace()`
* `endswith()`
* `startswith()`

## Exercise
Complete the function below. If the string passed in starts with an uppercase letter, return the whole string as uppercase. Otherwise, if the letter 'p' occurs 2 or more times in the string, return the first 2 characters of the string. Else, return the string concatenated with "HackBU is cool"

In [0]:
def stringsAreFun(string):
  #Implement me!
  
  
  
assert(stringsAreFun("Pickles") == "PICKLES")
assert(stringsAreFun("apples") == "ap")
assert(stringsAreFun("It's true that"))
print("Good Job, you completed this exercise!")

Good Job, you completed this exercise!


## Regular Expressions

Another way to manipulate strings in Python is using Regular Expressions also referred to as regex equations. 

The most important characters in regex equations are . ^ $ * + ? { } [ ] \ | ( ) and these are known as metacharacters. 

### Common Metacharacters

[] hold characters you wish to match or compliment


^ indicates compliment when the first character inside []


. will match anything except newline character


multiplication sign * - will match previous spec as many times as possible (I spelled it out first cause it's a shortcut when in the front of a markdown line)


\ dereferences metacharacters or invokes a set pattern, some of which follow

### Set Patterns:

\d

    Matches any decimal digit; this is equivalent to the class [0-9].
    
    
\D

    Matches any non-digit character; this is equivalent to the class [^0-9].
    
    
\s

    Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].
    
    
\S

    Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].
    
    
\w

    Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].
    
    
    
\W

    Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].
    

### Functions:

after importing re,
we call re.compile(‘re_here’), where re_here is a regex pattern.
this should be set equal to a variable that will then hold the pattern.

my_pattern.match(string) - which returns the match object my_pattern matches or None if it is not at the beginning of the string

my_pattern.search(string) - which returns the match object my_pattern matches or None if it is not in the string

my_pattern.findall(string) - which returns all match objects my_pattern matches in a list

## A match object is an object type that contains information about the match: where it starts and ends, the substring it matched, (and more). Match objects always have a value True.

on match objects we can call the functions group(), start(), and end(), to return the substring, and its starting and ending indexes respectively.

In [3]:
import re
rule = re.compile('[a-z]*')
match = rule.match('kittens')
print(match)
print(match.group())

<_sre.SRE_Match object at 0x7f47a00a16b0>
kittens


### Now try to find if the string stored in variable has any substrings that contain a vowel followed by a number using a regex expression

In [4]:
import re
variable = "gjfh8ohjewoiryu83hjfyh6uhjb7"


### More String Functions 
You can find more abiout Python's built-in string methods by [here](https://www.w3schools.com/python/python_ref_string.asp)

In [0]:
# Try manipulating some strings here!