# Strings #

Strings are an array (a sequence) of characters. However, Python does not have a character data type, a single character is simply a string with a length of 1.

Below are examples of different string literals being assigned to a variable.
**Note** different quitations.
**Note** that with strings the '+' operand means "concatenate":
___

In [16]:
# Assignment
str1 = "I have"
str2 = 'error.'
str3 = '1'
str4 = """.
This is a multi
line string
."""

# Multiline
print(str4)

# Concatenation
line = str1 + " " + 1 + " " + str2
print(line)

.
This is a multi
line string
.


TypeError: can only concatenate str (not "int") to str

____
What happened there?

Be **carefull** when working with different types. For example, if you actually want to perform an operation with a number as a string and an actual integer value you better do a conversion:
____

In [3]:
x = int(str3) + 5
print(x)

6


____
Normally you will have your inputs as strings so you wuould have to convert input numbers from strings. String inputs give you more control over error situations and/or bad user input.

Modify the code below to write the last character and the last word of the string.
____

In [6]:
line = str1 + " " + str3 + " " + str2

# Get length of string
n = len(line)
print("Lenght =", n)

# Print character at specific index
print("One letter: " + line[9])

# Just for the example
if "error" in line:
    i = line.find("have")
    print("Word found at index", i, ":)")
    
# Print second word
print("Second word: " + line[i:6])

Lenght = 15
One letter: e
Word found at index 2 :)
Second word: have


____
## The String Library ##

You can check available object methods by using *dir* function. You can also check Python's <a href="https://docs.python.org/3/library/stdtypes.html#string-methods">documentation</a> for detail reference. Below are few examples of methods:
____


In [9]:
# Few Examples:
print("Example of few methods: \n")
line = "example methods"
print(line, "-> upper() ->", line.upper())

# Notice scape character
print(line, "-> replace(\"example\", \"usefull\") ->", line.replace("example", "usefull"))

# Notice tab character
print("    \tI have a tab and spaces", "-> lstrip() ->", "    \tI don't".lstrip())

# Methods. Notice new line character. 
print("\nAvailable Methods: ")
dir(line)

Example of few methods: 

example methods -> upper() -> EXAMPLE METHODS
example methods -> replace("example", "usefull") -> usefull methods
    	I have a tab and spaces -> lstrip() -> I don't

Available Methods: 


['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


____
## String Format ##

The format() method can be use to format specified values and insert them inside a string placeholder. the placehodler is defined with curly brackets: {}.
The placeholders can be identified using named indexes, numbered indexes, or even empty placeholders.
____

In [4]:
txt1 = "My name is {fname}, I'm {age}".format(fname = "Jose", age = 23)
txt2 = "My name is {0}, I'm {1}".format("Jose",23)
txt3 = "My name is {}, I'm {}".format("Jose",23)

print(txt1)
print(txt2)
print(txt3)

My name is Jose, I'm 23
My name is Jose, I'm 23
My name is Jose, I'm 23


____
### Format modifiers ###

| Modifier | Description |
| --- | --- |
| :< | Left aligns the result (within the available space) |
| :> | Right aligns the result (within the available space) |
| :^ | Center aligns the result (within the available space) |
| := | Places the sign to the left most position |
| :+ | Use a plus sign to indicate if the result is positive or negative |
| :- | Use a minus sign for negative values only |
| :  | Use a space to insert an extra space before positive numbers (and a minus sign before negative numbers) |
| :, | Use a comma as a thousand separator |
| :_ | Use a underscore as a thousand separator |
| :b | Binary format |
| :c | Converts the value into the corresponding unicode character |
| :d | Decimal format |
| :e | Scientific format, with a lower case e |
| :E | Scientific format, with an upper case E |
| :f | Fix point number format |
| :F | Fix point number format, in uppercase format (show inf and nan as INF and NAN) |
| :g | General format |
| :G | General format (using a upper case E for scientific notations) |
| :o | Octal format |
| :x | Hex format, lower case |
| :X | Hex format, upper case |
| :n | Number format |
| :% | Percentage format |

Modifier example:
____

In [8]:
number = input('Enter number: ')
print("Hex:{:X}".format(int(number)))

Enter number:  14


Hex:E


____
# File Handles #

A file handle can be treated as an array of strings where each line is a string in the array. Before working on a file we need to tell Python which file we are going to open and what we will be doing with it.

*file_object = open('file_name', 'mode')  # Return a handle (not the file) use to manipulate the file*
                                   
### file_name ###

The file_name includes the file extension and assumes the file is in the current working directory. If the file location is elsewhere, provide the absolute or relative path.

### mode ###
This is optional, different modes are:<br>
- 'r' - Reads from a file and returns an error if the file does not exist (default). <br>
- 'w' - Writes to a file and creates the file if it does not exist or overwrites an existing file. <br>
- 'x' - Exclusive creation that fails if the file already exists.<br>
- 'a' - Appends to a file and creates the file if it does not exist or overwrites an existing file. <br>
- 'b' - Binary mode. Use this mode for non-textual files, such as images. <br>
- 't' - Text mode. Use only for textual files (default). <br>
- '+' - Activates read and write methods.<br>
____

In [8]:
file_handle = open('text_file.txt', 'r')
print(file_handle)

<_io.TextIOWrapper name='text_file.txt' mode='r' encoding='UTF-8'>


____
You can read the whole file at once:
____

In [17]:
file = file_handle.read()
print("Characters in this file", len(file))

Characters in this file 759


____
And that returns one big string. Or you can read it line by line using a loop:
____

In [35]:
for my_line in file_handle:
    print(my_line)

____
What happened there?
____

In [9]:
file_handle.seek(0)
for my_line in file_handle:
    print(my_line.rstrip())

MIME-Version: 1.0
References: <CABxEEohuqZBoVpsyY4pOFMYixhU2bzfxgs9tRLbUoV2NJMqCJw@mail.gmail.com>
<CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>
In-Reply-To: <CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>
From: Chris <c@sigparser.com>
Date: Wed, 9 Jan 2019 08:36:15 -0800
Message-ID: <CABxEEoizOPyCLkq4+FBGNaw7KC2TJDfTZF5dp8xD9aFjDQoL+Q@mail.gmail.com>
Subject: Re: food for thought
To: Paul <p@sigparser.com>

--000000000000382db0057f0910d5
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Ok.  Just a thought.  Got it.

<div><div dir=3D"auto">Ok.=C2=A0 Just a thought.=C2=A0 Got it. =C2=A0</div>=
</div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Jan 9, 2=


____
In the previous print you can see there are extra blank lines not present in the file; those comes from the print. To avoid that we can **remove the new line** ('\n') characters from every line by using rstrip() method.

Finally, you can also read all lines at once into a single array:
____

In [11]:
file_handle.seek(0)

lines = file_handle.readlines()
print(lines)

['MIME-Version: 1.0\n', 'References: <CABxEEohuqZBoVpsyY4pOFMYixhU2bzfxgs9tRLbUoV2NJMqCJw@mail.gmail.com> \n', '<CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>\n', 'In-Reply-To: <CAL5Lp9Xyo0mEQ6-c1yAQ+SuKXrT4Xu5y-7BnvnGS4RMjZOBJ=g@mail.gmail.com>\n', 'From: Chris <c@sigparser.com>\n', 'Date: Wed, 9 Jan 2019 08:36:15 -0800\n', 'Message-ID: <CABxEEoizOPyCLkq4+FBGNaw7KC2TJDfTZF5dp8xD9aFjDQoL+Q@mail.gmail.com>\n', 'Subject: Re: food for thought\n', 'To: Paul <p@sigparser.com>\n', '\n', '--000000000000382db0057f0910d5\n', 'Content-Type: text/plain; charset="UTF-8"\n', 'Content-Transfer-Encoding: quoted-printable\n', '\n', 'Ok.  Just a thought.  Got it.\n', '\n', '<div><div dir=3D"auto">Ok.=C2=A0 Just a thought.=C2=A0 Got it. =C2=A0</div>=\n', '</div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Jan 9, 2=']


____
Actually, you can also read strings character by character using a loop:
____

In [12]:
for char in lines[0]:
    print(char)
    
file_handle.close()

M
I
M
E
-
V
e
r
s
i
o
n
:
 
1
.
0




____
### Protections ###

When an error occurs, or exception as we call it, Python will normally stop and generate an error message.<br>
These exceptions can be handled using the try statement:

- The try block lets you test a block of code for errors.<br>
- The except block lets you handle the error.<br>
- The else block lets you execute code when there is no error.<br>
- The finally block lets you execute code, regardless of the result of the try- and except blocks.<br>
____

In [5]:
file_name = input('Enter a file name: ')
try:
    handler = open(file_name)
except:
    print('File cannot be opened:', file_name)
else:
    count = 0
    for line in handler:
        count = count + 1

    print("There were", count, "lines in", file_name)
    handler.close()

Enter a file name:  text_file.txt


There were 18 lines in text_file.txt


____
### Searching Through a File ###

Remember methods from String Library. Let's use .startswith('')

Who is sending this email?
____

In [15]:
file_handle = open('text_file.txt', 'r')

for line in file_handle:
    if line.startswith('From:'):
        print(line)
    else:
        continue


From: Chris <c@sigparser.com>



____
### Exercise ###

In addition to what we did before create a script that prints the domain name of the sender's email address. 
____

____
### Split Function ###

Specially usefull to implement a tokenizer. A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens.
____

In [1]:
line = "This   is a line we want\nto \t split\n"
words = line.split()
print(words)

line = "this, can be, a csv file, we want to split"
words = line.split(',')
print(words)


['This', 'is', 'a', 'line', 'we', 'want', 'to', 'split']
['this', ' can be', ' a csv file', ' we want to split']


____
### Exercise ###

Repeat previous exercise using the split function. 
____

### References ###
- https://www.freecodecamp.org/learn/scientific-computing-with-python
- https://www.w3schools.com/python/python_strings.asp
- https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html#:~:text=A%20tokenizer%20receives%20a%20stream,whenever%20it%20sees%20any%20whitespace.