## Introduction to Variables: Strings

### <font color='gray'>Introduction</font>
As we already know from living in the digital age and the lessons we've already seen, programming is a powerful tool for answering questions about data. It allows us to collect, clean up and format our data and then perform calculations on that data.
Much of our digital information is in the form of text, for example song lyrics and emails. To clean up and format that text with Python, we need to become familiar with our first type of data, the String.

### <font color='gray'>Objectives</font>
By the end of this lesson, you will be able to:
* Understand and use the `**String**` data type
* Understand, explain and use the correct data types for various types of information

### <font color='gray'>Built-in Data Types</font>
In programming, data type is an important concept.

Variables can store data of different types, and different types can do different things.

Python has the following data types built-in by default, in these categories:
| Types | Data Types |
| :- | -: |
| Text Type | `str`  
| Numeric Types: | `int`, `float`, `complex` 
| Sequence Types: | `list`, `tuple`, `range`  
| Mapping Type: | `dict`  
| Set Types: | `set`, `frozenset`  
| Boolean Type: | `bool`  
| Binary Types: | `bytes`, `bytearray`, `memoryview`  
| None Type: | `NoneType`  

### <font color='gray'>What are Strings?</font>

A lot of information in the world is in the form of text. To capture this information and operate on it in Python we take this text and make it into the **String** (`str`) data type.

Below, we have the name of a cartoon character, hello. By putting quotes (`""`) or (`''`) around the name, we create a string.

```python
"hello"
```
When programmers say *string*, what they mean is text.  When programmers say *data type*, they just mean type of data.  We can think of `'hello'` as an instance of the string data type.

Here are a few other types of data in Python that we will talk more about in later lessons:
```python
100 # Integer
10.0 # Float
True # Boolean
```

Since there are several types of data in Python we can discover the type of any piece of data by calling, or executing, the `type()` function. By calling or executing a function, we mean running the function so that it executes the code within it.

Let's look at an example below:

> **Note:** Press the shift + enter keys to run the code below. The cell that populates below is the return or output of the type function.

In [1]:
type("hello")

str

We need to pay attention to what type of data we are working with because they operate differently and have different values as well as functions that we are able to use on them. 

For example, to create a new string (or to *initialize* a string) we cannot simply type letters. Instead, we need to be very explicit and tell Python it is about to see some text. We do this by surrounding our text with quotes, `""`.  If we don't do that or end our quotation marks too early, Python will throw an error.

In [2]:
"This is a properly formatted string!"

'This is a properly formatted string!'

In [5]:
"Th"is will throw an error!

SyntaxError: invalid syntax (3198611760.py, line 1)

> **Note:** double quotes and single quotes can be used interchangeably in Python; however, for readability it is important that we stay consistent. At first, it might seem strange how picky programmers are about details like this, but after a couple of years of coding, you too might end up in a fight like [this one](https://www.youtube.com/watch?v=SsoOG6ZeyUI)!

Strings can contain numbers as well. It may look like 42, but it if it wrapped with `'42'`, then it is a string. See the difference below.

In [11]:
type('42')

str

In [12]:
type(42)

int

In [13]:
type(42.0)

float

### <font color='gray'>Changing Data With Built In Methods</font>  
Python is picky like this for a reason. For example, once it knows we are working with a string, it gives us specific functionality for operating on strings. We call this functionality a <font color='navyblue'>_**function**_</font> or a <font color='navyblue'>_**method**_</font>.

In [2]:
x = "Hello"
print(x)

Hello


### <font color='gray'>String Indexing</font>  
Individual characters can be accessed using blockquotes, counting starts from zero.

In [5]:
print(x[0])
print(x[1])

H
e


The first character starts at zero. This may be a bit counter intuitive, but has historic reasons.  
### <font color='gray'>Sub string</font>

By using a colon you can create a substring. If no start or end number is written, Python assumes you mean the first character or last character.

Let's try the example below:

In [8]:
x = "hello world"
s = x[0:3]
print(s)
s = x[:3]
print(s)

hel
hel


The following example does a lot of string operations like printing text, numbers, combining strings, slicing and accessing elements.

Let's try it!

In [10]:
x = "Nancy"
print(x)

# Combine numbers and text
s = "My lucky number is %d, what is yours?" % 7
print(s)

# alternative method of combining numbers and text
s = "My lucky number is " + str(7) + ", what is yours?"
print(s)

# another method of combining number and text
s = f"My lucky number is {7}, what is yours?"
print(s)

# yet another method of combining numbers and text
s = "My lucky number is {}, what is yours?".format(7)
print(s)

# print character by index
print(x[0])

# print piece of string
print(x[0:3])

Nancy
My lucky number is 7, what is yours?
My lucky number is 7, what is yours?
My lucky number is 7, what is yours?
My lucky number is 7, what is yours?
N
Nan


More <font color='gray'>Python</font> method that works with a Strings:  

SYNTAX:
```python
str.methodname()
```  
<ul>
    <li>count()</li>
    <li>lower()</li>
    <li>upper()</li>
    <li>find()</li>
    <li>strip()</li>
    <li>title()</li>
    <li>endswith()</li>
    <li>startswith()</li>
    <li>replace()</li>
</ul>

In [24]:
s = "Hello World"
print(s)

# count the number of 'l' 
print(s.count('l'))

# index on the first five 
print(s[:5])

# lower everything
print(s.lower())

# upper everything 
print(s.upper())

# title a text 
print(s.title())

Hello World
3
Hello
hello world
HELLO WORLD
Hello World


In [26]:
u = "     amist kilo University      "
print(u)

# strip a text 
print(u.strip())

# check endswith 
print(u.endswith('ity'))

# check startswith 
print(u.startswith('  '))

# replace 'amist' with 'arat'
print(u.replace('amist', 'arat'))

     amist kilo University      
amist kilo University
False
True
     arat kilo University      


But, to make it clear

In [28]:
u = "     amist kilo University      "
print(u)

# strip a text 
new_u = u.strip()
print(new_u)

# check endswith 
print(new_u.endswith('ity'))

# check startswith 
print(new_u.startswith('  '))

# replace 'amist' with 'arat'
print(new_u.replace('amist', 'arat'))

     amist kilo University      
amist kilo University
True
False
arat kilo University


### Exercise

You have this text string `" name Mr. Sisay sfearaerereg "`. can you pull `"Mr, Sisay"` from the string? 

In [21]:
s = 'name Dr. sisay sfearaerereg '
ix = s.lower().find('dr')
s[ix:].strip()

'Dr. sisay sfearaerereg'

Yep.  Bad news bears.

As we can see in the examples above, we can operate on a datatype using the following format:   
```code
[INSTANCE OF A DATATYPE] [DOT] [METHOD NAME] [PARENTHESES]
```

Here is an examples that follow this format and returns a `True` or `False` value:

As you can see in this notebook, most of our operations on data will follow the data-dot-method_name-parentheses format.

##  Discovering New Methods

You may be starting to worry about there being too many methods to keep track of. Let's ask Python for help with finding more information about what we can do with strings.

The `help()` function in Python comes built-in and is like an old school Alexa. We give our prompt or *data type* to the `help()` function and it tells us everything it knows about that data type.

Let's see what happens when we type `help(str)`.

In [25]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

So, we can see from the output it gives us a lot of information regarding the datatype including built-in methods we can use to operate on data of that particular type (i.e. Strings).

**Note:** *If you type the `help()` function in your terminal and the output is longer than the window, you can press the letter `q` to exit back to normal operation.*

Holy cow that's a lot of words. If we scroll down to the word capitalize, things begin to make more sense. For example, for capitalize, this is what it says:

```python
capitalize(...)

    S.capitalize() -> str
    Return a capitalized version of S, i.e. make the first character
    have upper case and the rest lower case.
```

Our next step is to use our formula of datatype-dot-method name-parentheses, and see what happens next.

In [26]:
"smithers".capitalize()

'Smithers'

## Tips going forward

There are many other examples of methods to use with strings in Python. In fact, they are all listed when you called `help(str)` above. What is important when learning how to code in Python is to think of the right question to ask when you are stuck and learn how to read the documentation. There is a cycle to getting comfortable with Python functions and methods.

In this lesson, we went through that cycle:
* Guess: We just tried something and looked to the error message for clues as to what to do next.
* help(str): We saw a nice way to learn about new methods, then we took a guess to test our understanding
* Following a pattern: We started with a simple method like calling upper, took a moment to break this down into a pattern, and then tried this pattern again to call other methods

Here is one more method of discovery:  **just ask Google**.  For example, look what happens when we ask Google about capitalization.

![](https://learn-verified.s3.amazonaws.com/data-science-assets/ask-google.png)

[A great link with a detailed answer.](https://stackoverflow.com/a/1549644)

![](https://learn-verified.s3.amazonaws.com/data-science-assets/stack-overflow.png)

Then we try this new method out ourselves, to see if this user on StackOverFlow is right (they normally are).

In [27]:
"hello world".title()

'Hello World'

## Strings are Arrays
Like many other popular programming languages, strings in Python are arrays of bytes representing unicode characters.

However, Python does not have a character data type, a single character is simply a string with a length of 1.

Square brackets can be used to access elements of the string.

In [35]:
# Get the character at position 1 (remember that the first character has the position 0):
a = "Hello, World!"
print(a[1])

e


In [36]:
a.capitalize()

'Hello, world!'

In [71]:
name_1 = 'Dr. xxxx yyyy, jr III'
name_2 = 'his name os Dr. aaaa bbb zzz'
name_3 = 'his known as Dr. abera'
name_4 = 'his known as Dr. Miki desta'
ix = name_4.lower().find('dr')
# ti = name_2.(' ') 
his_name = name_4[ix:] # Dr. abera => ['Dr.', 'abera']
his_fname = his_name.split(' ')[1]
his_lname = 'last name not found' if len(his_name.split(' '))<3 else his_name.split(' ')[2]
print(his_fname)
print(his_lname)
# fname = name.split(' ')[1]
# lname = name.split(' ')[2].strip(',')

Miki
desta


In [None]:
|col1 | col2| col3| col4| col5 | 
| 1   | 2   | 3   | 4   | 5   |

In [46]:
name.split(" ")

['Dr.', 'xxxx', 'yyyy,', 'jr', 'III']

In [73]:
fname, lname = 'sisay', 'semere'
full_name_1 = fname + ' ' + lname
full_name_2 = f'{fname} {lname}'
full_name_3 = '{} {}'.format(fname, lname)
print(full_name_1)
print(full_name_2)
print(full_name_3)


sisay semere
sisay semere
sisay semere


In [53]:
full_name

'xxxx yyyy'

In [None]:
'sisayyyy'

## Looping Through a String
Since strings are arrays, we can loop through the characters in a string, with a for loop.

In [9]:
for x in "Kitfo":
  print(x)

K
i
t
f
o


## Slicing
You can return a range of characters by using the slice syntax.

Specify the start index and the end index, separated by a colon, to return a part of the string.


<img src='./images/string-slicing.png' width=500>

In [54]:
b = "Hello, Ethiopians!" 
print(b[7:15])

Ethiopia


In [56]:
print(b[7:-3])

Ethiopia


In [57]:
b[7:]

'Ethiopians!'

In [58]:
b[:5]

'Hello'

In [59]:
b = "Hello, Ethiopians!"
print(b[-11:-2])

Ethiopian


## Negative Indexing
Use negative indexes to start the slice from the end of the string:

## Remove Whitespace
Whitespace is the space before and/or after the actual text, and very often you want to remove this space.

In [60]:
a = "          Hello, Ethiopians!          "
a

'          Hello, Ethiopians!          '

In [69]:

'hello' != 'Hello'


True

In [61]:
print(a.strip()) # returns "Hello, Ethiopians!

Hello, Ethiopians!


## Replace String

In [62]:
a = "Jello, Ethiopains!"
print(a)
b = a.replace("J", "H")
print(b)

Jello, Ethiopains!
Hello, Ethiopains!


## Split String
The split() method returns a list where the text between the specified separator becomes the list items.

In [33]:
a = "Hello, Ethiopians!"
print(a.split(",")) # returns ['Hello', ' Ethiopians!']

['Hello', ' Ethiopians!']


## String Methods
Learn more about String Methods with our [String Methods Reference](https://www.w3schools.com/python/python_ref_string.asp)

## String Format
As we learned in the Python Variables chapter, we cannot combine strings and numbers like this:

In [63]:
age = 24
msg = "My name is Nati, I am forever" + age
print(msg)

TypeError: can only concatenate str (not "int") to str

In [65]:
age = "
msg = "My name is Nati, I am forever " + age
print(msg)

My name is Nati, I am forever 24


But we can combine strings and numbers by using the `format()` method!

The `format()` method takes the passed arguments, formats them, and places them in the string where the placeholders `{}` are:

In [66]:
age = 24
msg = "My name is John, and I am {}"
print(msg.format(age))

My name is John, and I am 24


The `format()` method takes unlimited number of arguments, and are placed into the respective placeholders:

In [67]:
quantity = 3
itemno = 567
price = 49.95
myorder = "I want {} pieces of item {} for {} dollars."
print(myorder.format(quantity, itemno, price))

I want 3 pieces of item 567 for 49.95 dollars.


see this as an example

<img src='./images/glue-job.png' width=500>

## Multiline Strings
You can assign a multiline string to a variable by using three quotes:

In [70]:
a = """This 
is a 
multiline 
string."""
print(a)

This 
is a 
multiline 
string.


In [76]:
a = """This 
is a 
multiline 
string."""
print(a)

This 
is a 
multiline 
string.


In [96]:
s = '44 cool dress you have'
type(s)

str

In [97]:
print(s) # 44: cool: dress: you: have

44 cool dress you have


In [100]:
s.split(' ')[0] + ':- ' +s.split(' ')[1] + ':- ' +s.split(' ')[2] + ':- ' + s.split(' ')[3] + ':- ' + s.split(' ')[4]

'44:- cool:- dress:- you:- have'

In [101]:
':- '.join(i for i in s.split(' '))

'44:- cool:- dress:- you:- have'

In [108]:
ss = 44
type(ss)

int

In [None]:
type =  'hello'

And our work is done.

Feel free to look at [other common string operations here.](https://docs.python.org/2/library/string.html)

## Summary
In this lesson, we learned about our first datatype in Python: the string.  A string is just text. We indicate to Python that we are writing a string by surrounding our content with quotation marks. Once we do this, we can operate on this string by calling methods like `upper` or `endswith`. We identified a general pattern for calling methods on datatypes: 'instance of a datatype-dot-method name-parentheses'.

The second thing we learned was different mechanisms for learning about methods.  We saw the importance of guessing and experimentation, and how doing so can give us error messages, which provide clues. We also saw how to ask questions about a datatype by calling 'help' followed by the datatype name like `help(str)`.  Finally, we saw we can ask Google.  This mechanism of exploration is a skill we'll build up over time and this course will provide guidance and practice on along the way.