# PART 1
# Section 7: Handling Texts and Files

## 7.1 - String Operations

<br/>

Some characteristics of strings are:
- [x] Accept mathematical operations
- [x] Are indexable (slicing)
- [x] Are iterable
- [x] Accept special formatting
- [x] Backslashes (\\) are used in special cases

In [2]:
a = 'Python '

In [3]:
a * 10

'Python Python Python Python Python Python Python Python Python Python '

In [4]:
b = "Python is nice!"

In [8]:
b[4:7]

'on '

In [9]:
list(b)

['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'n', 'i', 'c', 'e', '!']

In [11]:
for i in b:
    print(i)

P
y
t
h
o
n
 
i
s
 
n
i
c
e
!


In [10]:
b

'Python is nice!'

In [17]:
age = 32

f'Rafael is {age} years old'

'Rafael is 32 years old'

In [23]:
print("This is my first sentence.\n 'I \" want to break a line before that sentence")

This is my first sentence.
 'I " want to break a line before that sentence


## 7.2 - String Methods

<br/>

**Some string manipulation methods:**

<br/>

| Method | Description |
| :-- | :-- |
| upper() | Returns uppercase letters |
| lower() | Returns lowercase letters |
| replace(arg_old, arg_new) | Replaces arg_old with arg_new |
| find(arg)| Returns the position of the first occurrence. Note: rfind() |
| strip() | Returns a string removing leading and trailing whitespace characters. Note: rstrip(), lstrip() |
| count() | Returns the number of occurrences of an item. |
| split(arg) | Returns a list where elements are separated by the input argument |
| isupper() | Checks if all characters are uppercase and returns a boolean. Note: islower(), isalpha(), isnumeric() |
| startswith(arg) | Checks if the string starts with arg. Note: endswith() |

<br/>

**Note:**
<pre>It's always important to check how the method you're using acts on the object. In this case, the methods do not modify the object, as strings are immutable.</pre>


In [24]:
a = 'PytHon'

In [27]:
a = a.lower()

In [31]:
a = a.upper()

In [46]:
b = 'HelloPWorld!PImPhere'

In [47]:
b = b.replace('P', ' ')

In [48]:
b

'Hello World! Im here'

In [52]:
b.rfind('l')

9

In [54]:
b[10]

'd'

In [55]:
c = '    Python is nice   '

In [59]:
c = c.strip()

In [65]:
c.split('n')

['Pytho', ' is ', 'ice']

In [68]:
c.isupper()

False

In [71]:
b

'Hello World! Im here'

In [73]:
b.startswith('H')

True

In [74]:
b.endswith('e')

True

## 7.3 - String Formatting

<br/>

Starting from Python 3.6, f-strings were added, which make string formatting easier. If you are using an earlier version of Python, there are other ways to format strings such as the format() method or %.

<br/>

#### Syntax of f-strings

```python
>>> result = 10
>>> text_result = f'Result = {result}'
>>> pi = 3.14159
>>> pi_value = f'The value of pi is {pi: .2f}'
```

<br/>

**Note:**
<pre>In addition to f-strings, there are other prefixes that can be used: r for raw, b for binary, and u for unicode.</pre>

In [6]:
number = 20


a = f'ten times two is equal to {20}'

In [7]:
a

'ten times two is equal to 20'

In [8]:
c = 1 / 3
c

0.3333333333333333

In [12]:
f'One divided by three is equal to {c: .2f}'

'One divided by three is equal to  0.33'

In [20]:
path = b'\path\folder1\folder2\file.ext \n line breaker'
print(path)

b'\\path\x0colder1\x0colder2\x0cile.ext \n line breaker'


## 7.4 - Reading and Writing Files

<br/>

Reading and writing files is a great way to automate tasks in science and engineering. However, we must do this correctly to avoid leaving the file open without our knowledge. To avoid this risk, use the **with** statement.

#### Syntax of f-strings

```python
>>> path = 'C:/....'
>>> with open(path, 'r') as f:
>>>     content = f.read()
>>> print(content)
```

**Notes:**

<pre>Parameters: 
- 'r': read
- 'w': write
- 'wb': write binary</pre>

<pre>Methods:
- read(): reads the content of the file and returns a string
- readlines(): reads the content of the file and returns a list</pre>

In [38]:
path = r'D:\repositories\python-for-engineers-and-scientists\Aux_files\7.4\equipement_data.txt'
with open(path, 'r') as f:
    content = f.readlines()

In [41]:
len(content)

16

In [43]:
content

['Date,Equipment ID,Temperature (°C),Pressure (bar),Flow Rate (m³/h),Efficiency (%),Operating Hours\n',
 '2024-05-01,EQ-001,75.2,15.3,120.5,88.6,8.0\n',
 '2024-05-02,EQ-001,76.8,15.7,122.0,89.1,7.5\n',
 '2024-05-03,EQ-001,74.5,15.1,119.8,87.9,8.5\n',
 '2024-05-04,EQ-002,80.3,16.2,130.7,91.2,6.0\n',
 '2024-05-05,EQ-002,79.5,16.0,128.4,90.8,6.5\n',
 '2024-05-06,EQ-002,81.0,16.5,132.1,91.5,5.5\n',
 '2024-05-07,EQ-003,78.2,15.5,125.6,89.5,7.0\n',
 '2024-05-08,EQ-003,77.6,15.4,124.8,89.0,7.2\n',
 '2024-05-09,EQ-003,79.0,15.8,126.3,90.1,6.8\n',
 '2024-05-10,EQ-004,73.5,14.8,118.2,87.0,8.2\n',
 '2024-05-11,EQ-004,74.1,15.0,119.0,87.3,8.0\n',
 '2024-05-12,EQ-004,72.8,14.6,117.5,86.8,8.3\n',
 '2024-05-13,EQ-005,82.3,16.8,135.2,92.0,5.0\n',
 '2024-05-14,EQ-005,83.0,17.0,136.5,92.3,4.8\n',
 '2024-05-15,EQ-005,81.7,16.7,134.0,91.8,5.2']

In [44]:
content = content[0:15]

In [45]:
content.append('2024-05-15,EQ-005,81.7,16.7,134.0,91.8,5.2\n')
content.append('2024-05-15,EQ-005,81.7,16.7,134.0,120,24')

In [46]:
content

['Date,Equipment ID,Temperature (°C),Pressure (bar),Flow Rate (m³/h),Efficiency (%),Operating Hours\n',
 '2024-05-01,EQ-001,75.2,15.3,120.5,88.6,8.0\n',
 '2024-05-02,EQ-001,76.8,15.7,122.0,89.1,7.5\n',
 '2024-05-03,EQ-001,74.5,15.1,119.8,87.9,8.5\n',
 '2024-05-04,EQ-002,80.3,16.2,130.7,91.2,6.0\n',
 '2024-05-05,EQ-002,79.5,16.0,128.4,90.8,6.5\n',
 '2024-05-06,EQ-002,81.0,16.5,132.1,91.5,5.5\n',
 '2024-05-07,EQ-003,78.2,15.5,125.6,89.5,7.0\n',
 '2024-05-08,EQ-003,77.6,15.4,124.8,89.0,7.2\n',
 '2024-05-09,EQ-003,79.0,15.8,126.3,90.1,6.8\n',
 '2024-05-10,EQ-004,73.5,14.8,118.2,87.0,8.2\n',
 '2024-05-11,EQ-004,74.1,15.0,119.0,87.3,8.0\n',
 '2024-05-12,EQ-004,72.8,14.6,117.5,86.8,8.3\n',
 '2024-05-13,EQ-005,82.3,16.8,135.2,92.0,5.0\n',
 '2024-05-14,EQ-005,83.0,17.0,136.5,92.3,4.8\n',
 '2024-05-15,EQ-005,81.7,16.7,134.0,91.8,5.2\n',
 '2024-05-15,EQ-005,81.7,16.7,134.0,120,24']

In [48]:
result = ''
for line in content:
    result += line



In [49]:
path2 = r'D:\repositories\python-for-engineers-and-scientists\Aux_files\7.4\equipement_data2.txt'
with open(path2, 'w') as f:
    f.write(result)

## 7.5 - Encoding

<br/>

Simply put, encoding is a conversion table from binary format to text format.

<br/>

#### Examples


| UTF-8                      | ANSI       | ASCII     | Character |
| ------------------------- | ---------- | --------- | --------- |
| 01000001 (41)             | 01000001   | 01000001  | A         |
| 01011010 (5A)             | 01011010   | 01011010  | Z         |
| 01100001 (61)             | 01100001   | 01100001  | a         |
| 01111010 (7A)             | 01111010   | 01111010  | z         |
| 11000011 10100001 (C3 A1) | 11100001   | --        | á         |
| 11000011 10000001 (C3 81) | 11000001   | --        | Á         |



<br/>

**Note:**
<pre>ASCII is formed with 1 byte (8 bits), with the first number not being used, so the table has 128 values. This table does not include accents.</pre>
<pre>ANSI is formed with 1 byte (8 bits), it has 256 values. It contains all those from ASCII plus others including accents.</pre>
<pre>UTF-8 uses a multibyte concept. The first 128 characters are from ASCII. As demand for special characters increases, more bytes are included. The UTF-8 consortium contains over a million characters and some of them can be up to 3 bytes.</pre>

<br/>

In [87]:
def binaryToDecimal(n):
    return int(n, 2)

In [99]:
hex(binaryToDecimal('10000001'))

'0x81'

In [96]:
a = 'á'

a.encode(encoding='UTF-8')

b'\xc3\xa1'

# Exercises

## E7.1
Given any email, ```example@gmail.com```, return a list with the username and the domain.

Output:

```
[example, gmail]
```

In [50]:
email = 'rafael@gmail.com'

In [54]:
email_format = email.split('@')

In [56]:
email_format[1] = email_format[1].split('.')[0]

In [57]:
email_format

['rafael', 'gmail']

In [58]:
def split_email(email):
    email_format = email.split('@')
    email_format[1] = email_format[1].split('.')[0]
    return email_format

In [59]:
split_email(email)

['rafael', 'gmail']

## E7.2
Given a file with emails, return a list with the usernames and domains.

Suggested data format:
```python
email_list = [[username1, domain1],
              [username2, domain2],
              [usernamen, domainn]
             ]
```

In [1]:
path = r'D:\repositories\python-for-engineers-and-scientists\Aux_files\E7.2\emails.txt'
with open(path, 'r') as f:
    emails = f.readlines()


In [2]:
len(emails)

1000

In [3]:
def split_email(email):
    email_format = email.split('@')
    email_format[1] = email_format[1].split('.')[0]
    return email_format

In [4]:
emails_format = []
for email in emails:
    emails_format.append(
        split_email(email)
    )

In [5]:
emails_format

[['jessica.thomas', 'quantumbridge'],
 ['thomas.martinez', 'solarflare'],
 ['mary.miller', 'quantumbridge'],
 ['robert.jones', 'techsparkle'],
 ['elizabeth.martinez', 'digitaldawn'],
 ['robert.anderson', 'greengrove'],
 ['william.smith', 'digitaldawn'],
 ['thomas.davis', 'digitaldawn'],
 ['linda.wilson', 'quantumbridge'],
 ['mary.williams', 'bluestream'],
 ['susan.davis', 'greengrove'],
 ['robert.miller', 'riverbend'],
 ['david.miller', 'riverbend'],
 ['patricia.rodriguez', 'echovalley'],
 ['patricia.johnson', 'digitaldawn'],
 ['patricia.martinez', 'bluestream'],
 ['mary.davis', 'echovalley'],
 ['linda.miller', 'riverbend'],
 ['john.martinez', 'solarflare'],
 ['mary.thomas', 'techsparkle'],
 ['mary.wilson', 'riverbend'],
 ['robert.wilson', 'greengrove'],
 ['thomas.jones', 'quantumbridge'],
 ['linda.smith', 'bluestream'],
 ['jessica.moore', 'bluestream'],
 ['elizabeth.thomas', 'bluestream'],
 ['thomas.thomas', 'digitaldawn'],
 ['jennifer.brown', 'bluestream'],
 ['michael.moore', 'bluest

## E7.3
Given the list of emails from the previous exercise, create a table with the following format:

```
name = Login               domain = Domain
name = login1              domain = domain1
name = login2              domain = domain2
name = loginn              domain = domainn
```

In [12]:
#emails_format

def pretty_print(info):
    name, domain = info
    spaces = 20 - len(name)
    result = 'name = ' + name + spaces*' ' + 'domain = ' + domain
    return result

In [13]:
pretty_print(['mary.davis', 'echovalley'])

'name = mary.davis          domain = echovalley'

In [14]:
for user in emails_format:
    print(
        pretty_print(user)
    )

name = jessica.thomas      domain = quantumbridge
name = thomas.martinez     domain = solarflare
name = mary.miller         domain = quantumbridge
name = robert.jones        domain = techsparkle
name = elizabeth.martinez  domain = digitaldawn
name = robert.anderson     domain = greengrove
name = william.smith       domain = digitaldawn
name = thomas.davis        domain = digitaldawn
name = linda.wilson        domain = quantumbridge
name = mary.williams       domain = bluestream
name = susan.davis         domain = greengrove
name = robert.miller       domain = riverbend
name = david.miller        domain = riverbend
name = patricia.rodriguez  domain = echovalley
name = patricia.johnson    domain = digitaldawn
name = patricia.martinez   domain = bluestream
name = mary.davis          domain = echovalley
name = linda.miller        domain = riverbend
name = john.martinez       domain = solarflare
name = mary.thomas         domain = techsparkle
name = mary.wilson         domain = riverbend
na

## E7.4
Given the previous email file, count how many emails from each domain are in the list of emails.

Tip: Use the concepts of loops and data structures from previous chapters.

In [16]:
#emails_format