# String Manipulations & Files

### Our Task
Syncing the subtitles to a video, i.e. shift in time (e.g. 1.5 second) the appearence of the subtitles.

We have `the-good-the-bad-and-the-ugly-1966.str`, it is `.str` file type. It looks like that:
```
...

986
02:02:23,419 --> 02:02:27,339
Now I find you in exactly
the position that suits me.

987
02:02:27,423 --> 02:02:31,092
I had lots of time to learn
how to shoot with my left.

988
02:02:47,151 --> 02:02:49,277
When you have to shoot,
shoot. Don't talk.

989
02:02:53,366 --> 02:02:55,950
Every gun makes its own tune.

...

```

Every subtitle quote made of few lines:
1. First line - `index`
2. Second line - timing - `<start_time> --> <end_time>`
3. Third line (and sometimes fourth)  - the text itself

The subtitle quotes are seperated by an empty line.

### How we are going to do that?
1. **String Manipulations** - Manipulate the timing string, i.e. only the secone line.
2. **Files** - Load the subtitle file, manipulate it and save it.

### Textbook References
http://www.diveintopython3.net/strings.html#divingin (Sections 4.3-4.5 only)

## String Manipulations

Let's take this quote:
```
988
02:02:47,151 --> 02:02:49,277
When you have to shoot,
shoot. Don't talk.
```

and shift its time (both `start_time` and `end_time` by 12 seconds).

We will do that fully in the notebook, and then transform our code into a function

### Line by line

First, we will work only on the timing line.

In [29]:
timing = '02:02:47,151 --> 02:02:49,277'

In [30]:
len(timing)

29

In [31]:
timing[0], timing[1], timing[2]

('0', '2', ':')

In [32]:
timing[-1], timing[-2], timing[-3]

('7', '7', '2')

the index of the first occurrence:

In [33]:
timing.find(':')

2

another option for the index of the first occurrence:

In [34]:
timing.index(':')

2

however, `find` and `index` are different when the sub-string is not in the string.
`find` return `-1`, and `index` rasie an exceptio:

In [35]:
timing.find('.')

-1

In [36]:
timing.index('.')

ValueError: substring not found

#### Let's work on the timimg strings

In [37]:
start_time, end_time = timing[:12], timing[-12:]
start_time, end_time

('02:02:47,151', '02:02:49,277')

In [38]:
start_time, end_time = timing.split(' --> ')
start_time, end_time

('02:02:47,151', '02:02:49,277')

In [39]:
hours, minutes, seconds = start_time.split(':')
hours, minutes, seconds

('02', '02', '47,151')

In [40]:
seconds, millisecond = seconds.split(',')
seconds, millisecond

('47', '151')

In [41]:
hours = int(hours)
minutes = int(minutes)
seconds = int(seconds)
millisecond = int(millisecond)

In [42]:
seconds += 12

In [43]:
shifted_start_time = str(hours)   + ':' +  \
                     str(minutes) + ':' +  \
                     str(seconds) + ',' +  \
                     str(millisecond)

In [44]:
shifted_start_time

'2:2:59,151'

#### Reference: [Python String Formatting - Padding Numbers](https://pyformat.info/#number_padding)

In [45]:
shifted_start_time = '{:02d}:{:02d}:{:02d},{:03d}'.format(hours,
                                                          minutes,
                                                          seconds,
                                                          millisecond)

In [46]:
shifted_start_time

'02:02:59,151'

In [47]:
def shift_timing(time_str, delta):
    hours, minutes, seconds = time_str.split(':')
    seconds, millisecond = seconds.split(',')
    
    hours = int(hours)
    minutes = int(minutes)
    seconds = int(seconds)
    millisecond = int(millisecond)
    
    seconds += delta
    
    shifted_time = '{:02d}:{:02d}:{:02d},{:03d}'.format(hours,
                                                        minutes,
                                                        seconds,
                                                        millisecond)
    
    return shifted_time

In [48]:
shift_timing(start_time, delta=12)

'02:02:59,151'

In [49]:
shift_timing(end_time, delta=12)

'02:02:61,277'

#### "Blondie, we have a problem!"

When `seconds` exceeds 59 (i.e. 60 and greater), we should convert the extra seconds into minutes.
In the same manner, convert extra minutes into hours.

In [50]:
seconds = 61

In [51]:
seconds/60

1.0166666666666666

In [52]:
seconds//60

1

In [53]:
seconds%60

1

In [54]:
def shift_timing(time_str, delta):
    hours, minutes, seconds = time_str.split(':')
    seconds, millisecond = seconds.split(',')
    
    hours = int(hours)
    minutes = int(minutes)
    seconds = int(seconds)
    millisecond = int(millisecond)
    
    seconds += delta
    
    minutes += seconds//60
    seconds %= 60
    
    hours += minutes//60
    minutes %= 60
    
    shifted_time = '{:02d}:{:02d}:{:02d},{:03d}'.format(hours,
                                                        minutes,
                                                        seconds,
                                                        millisecond)
    
    return shifted_time

In [55]:
shift_timing(start_time, delta=12)

'02:02:59,151'

In [56]:
shift_timing(end_time, delta=12)

'02:03:01,277'

### Excercises
1. Change `delta` to be in miliseconds. The convertions should be changed respectively.
2. Change `delta` to be string with the format of the timing line, i.e. `hours:minutes:seconds,millisecond`

## Handeling the whole quote

Now we will shift in time the whole quote.

In [57]:
quote = '''988
02:02:47,151 --> 02:02:49,277
When you have to shoot,
shoot. Don't talk.'''

print(quote)

988
02:02:47,151 --> 02:02:49,277
When you have to shoot,
shoot. Don't talk.


In [58]:
quote_lines = quote.splitlines()
quote_lines

['988',
 '02:02:47,151 --> 02:02:49,277',
 'When you have to shoot,',
 "shoot. Don't talk."]

In [59]:
timing = quote.splitlines()[1]
timing

'02:02:47,151 --> 02:02:49,277'

In [60]:
start_time, end_time = timing.split(' --> ')
start_time, end_time

('02:02:47,151', '02:02:49,277')

In [61]:
shifted_start_time = shift_timing(start_time, 12)
shifted_start_time

'02:02:59,151'

In [62]:
shifted_end_time = shift_timing(end_time, 12)
shifted_end_time

'02:03:01,277'

In [63]:
shifted_timing = shifted_start_time + ' --> ' + shifted_end_time
shifted_timing

'02:02:59,151 --> 02:03:01,277'

In [64]:
quote_lines[1] = shifted_timing
quote_lines

['988',
 '02:02:59,151 --> 02:03:01,277',
 'When you have to shoot,',
 "shoot. Don't talk."]

In [65]:
quote

"988\n02:02:47,151 --> 02:02:49,277\nWhen you have to shoot,\nshoot. Don't talk."

In [66]:
shifted_quote = '\n'.join(quote_lines)
shifted_quote

"988\n02:02:59,151 --> 02:03:01,277\nWhen you have to shoot,\nshoot. Don't talk."

In [67]:
print(shifted_quote)

988
02:02:59,151 --> 02:03:01,277
When you have to shoot,
shoot. Don't talk.


In [68]:
def shift_quote(quote, delta):
    quote_lines = quote.split('\n')
    start_time, end_time = quote_lines[1].split(' --> ')
    
    shifted_start_time = shift_timing(start_time, delta)
    shifted_end_time   = shift_timing(end_time, delta)
    shifted_timing = shifted_start_time + ' --> ' + shifted_end_time

    quote_lines[1] = shifted_timing
    
    shifted_quote = '\n'.join(quote_lines)
    
    return shifted_quote

In [69]:
shifted_qutote = shift_quote(quote, 12)
shifted_qutote

"988\n02:02:59,151 --> 02:03:01,277\nWhen you have to shoot,\nshoot. Don't talk."

In [70]:
print(shifted_qutote)

988
02:02:59,151 --> 02:03:01,277
When you have to shoot,
shoot. Don't talk.


## Files

## Reading

### Alternative #1: `read`

In [71]:
f = open('the-good-the-bad-and-the-ugly-1966.srt', 'r')

In [72]:
content = f.read()
print(content[:100])

1
00:02:59,512 --> 00:03:01,096
(COYOTE HOWLING)

2
00:05:35,585 --> 00:05:37,753
(GUNSHOTS FIRING)



In [73]:
len(content)

80671

In [74]:
content[:280]

"1\n00:02:59,512 --> 00:03:01,096\n(COYOTE HOWLING)\n\n2\n00:05:35,585 --> 00:05:37,753\n(GUNSHOTS FIRING)\n\n3\n00:05:51,976 --> 00:05:53,310\n(GROANS)\n\n4\n00:10:34,216 --> 00:10:36,718\nYou're from Baker?\n\n5\n00:10:48,230 --> 00:10:50,523\nTell Baker that\nI told him all\nthat I know already.\n\n"

In [75]:
f.close()

### Alternative #2: `readlines`

In [76]:
f = open('the-good-the-bad-and-the-ugly-1966.srt', 'r')

In [77]:
content_lines = f.readlines()
content_lines[:10]

['1\n',
 '00:02:59,512 --> 00:03:01,096\n',
 '(COYOTE HOWLING)\n',
 '\n',
 '2\n',
 '00:05:35,585 --> 00:05:37,753\n',
 '(GUNSHOTS FIRING)\n',
 '\n',
 '3\n',
 '00:05:51,976 --> 00:05:53,310\n']

In [78]:
f.close()

### Alternative #3: `for line in f`

In [79]:
f = open('the-good-the-bad-and-the-ugly-1966.srt', 'r')

In [80]:
for line in f:
    print(line)
    # do something with the line

1

00:02:59,512 --> 00:03:01,096

(COYOTE HOWLING)



2

00:05:35,585 --> 00:05:37,753

(GUNSHOTS FIRING)



3

00:05:51,976 --> 00:05:53,310

(GROANS)



4

00:10:34,216 --> 00:10:36,718

You're from Baker?



5

00:10:48,230 --> 00:10:50,523

Tell Baker that

I told him all

that I know already.



6

00:10:50,608 --> 00:10:52,233

Tell him I want

to live in peace,

understand?



7

00:10:52,318 --> 00:10:55,236

There is no use to

go on tormenting me!



8

00:10:55,321 --> 00:10:58,156

I know nothing at all

about that case of coins.



9

00:10:58,240 --> 00:10:59,699

Now that gold

has disappeared,

but if he'd listened,



10

00:10:59,784 --> 00:11:01,910

we could have

avoided this altogether.



11

00:11:03,496 --> 00:11:05,246

I went to the army court.



12

00:11:05,331 --> 00:11:07,040

There were no witnesses.



13

00:11:07,124 --> 00:11:09,459

They couldn't

uncover any more.



14

00:11:09,543 --> 00:11:12,462

I can't tell Baker

what happened to the money

In [81]:
f.close()

In [82]:
f = open('the-good-the-bad-and-the-ugly-1966.srt', 'r')

In [83]:
for line in f:
    print(repr(line))
    # do something with the line

'1\n'
'00:02:59,512 --> 00:03:01,096\n'
'(COYOTE HOWLING)\n'
'\n'
'2\n'
'00:05:35,585 --> 00:05:37,753\n'
'(GUNSHOTS FIRING)\n'
'\n'
'3\n'
'00:05:51,976 --> 00:05:53,310\n'
'(GROANS)\n'
'\n'
'4\n'
'00:10:34,216 --> 00:10:36,718\n'
"You're from Baker?\n"
'\n'
'5\n'
'00:10:48,230 --> 00:10:50,523\n'
'Tell Baker that\n'
'I told him all\n'
'that I know already.\n'
'\n'
'6\n'
'00:10:50,608 --> 00:10:52,233\n'
'Tell him I want\n'
'to live in peace,\n'
'understand?\n'
'\n'
'7\n'
'00:10:52,318 --> 00:10:55,236\n'
'There is no use to\n'
'go on tormenting me!\n'
'\n'
'8\n'
'00:10:55,321 --> 00:10:58,156\n'
'I know nothing at all\n'
'about that case of coins.\n'
'\n'
'9\n'
'00:10:58,240 --> 00:10:59,699\n'
'Now that gold\n'
'has disappeared,\n'
"but if he'd listened,\n"
'\n'
'10\n'
'00:10:59,784 --> 00:11:01,910\n'
'we could have\n'
'avoided this altogether.\n'
'\n'
'11\n'
'00:11:03,496 --> 00:11:05,246\n'
'I went to the army court.\n'
'\n'
'12\n'
'00:11:05,331 --> 00:11:07,040\n'
'There were no 

To get rid of the '\n' in the end of each line, we can use the string method `strip`:

In [84]:
'00:02:59,512 --> 00:03:01,096\n'.strip()

'00:02:59,512 --> 00:03:01,096'

### Choosen: #1!
### `read` with `split('\n\n')`
Note: with really big file, it might be more resanable to use the **third** alternative, because not all the file content might fit to the memory.

In [85]:
quotes = content.split('\n\n')
quotes[:5]

['1\n00:02:59,512 --> 00:03:01,096\n(COYOTE HOWLING)',
 '2\n00:05:35,585 --> 00:05:37,753\n(GUNSHOTS FIRING)',
 '3\n00:05:51,976 --> 00:05:53,310\n(GROANS)',
 "4\n00:10:34,216 --> 00:10:36,718\nYou're from Baker?",
 '5\n00:10:48,230 --> 00:10:50,523\nTell Baker that\nI told him all\nthat I know already.']

In [86]:
shift_quote(quotes[0], 12)

'1\n00:03:11,512 --> 00:03:13,096\n(COYOTE HOWLING)'

In [87]:
shifted_quotes = []

for quote in quotes:
    shifted_quotes.append(shift_quote(quote, 12))

IndexError: list index out of range

In [88]:
shifted_quotes = []

for quote in quotes:
    print(quote)
    shifted_quotes.append(shift_quote(quote, 12))

1
00:02:59,512 --> 00:03:01,096
(COYOTE HOWLING)
2
00:05:35,585 --> 00:05:37,753
(GUNSHOTS FIRING)
3
00:05:51,976 --> 00:05:53,310
(GROANS)
4
00:10:34,216 --> 00:10:36,718
You're from Baker?
5
00:10:48,230 --> 00:10:50,523
Tell Baker that
I told him all
that I know already.
6
00:10:50,608 --> 00:10:52,233
Tell him I want
to live in peace,
understand?
7
00:10:52,318 --> 00:10:55,236
There is no use to
go on tormenting me!
8
00:10:55,321 --> 00:10:58,156
I know nothing at all
about that case of coins.
9
00:10:58,240 --> 00:10:59,699
Now that gold
has disappeared,
but if he'd listened,
10
00:10:59,784 --> 00:11:01,910
we could have
avoided this altogether.
11
00:11:03,496 --> 00:11:05,246
I went to the army court.
12
00:11:05,331 --> 00:11:07,040
There were no witnesses.
13
00:11:07,124 --> 00:11:09,459
They couldn't
uncover any more.
14
00:11:09,543 --> 00:11:12,462
I can't tell Baker
what happened to the money.
15
00:11:12,546 --> 00:11:14,047
Go back and tell him that.
16
00:11:16,926 

IndexError: list index out of range

In [89]:
quotes[-3:]

['1222\n02:56:24,532 --> 02:56:27,450\nYou know what you are?',
 '1223\n02:56:31,038 --> 02:56:34,165\nJust a dirty son of a bitch !',
 '']

In [90]:
'X\n\nY\n\nZ\n\n'.split('\n\n')

['X', 'Y', 'Z', '']

In [91]:
'X\n\nY\n\nZ'.split('\n\n')

['X', 'Y', 'Z']

In [92]:
shifted_quotes = []

for quote in quotes[:-1]:
    shifted_quotes.append(shift_quote(quote, 12))

In [93]:
shifted_content = '\n\n'.join(shifted_quotes)
shifted_content[:280]

"1\n00:03:11,512 --> 00:03:13,096\n(COYOTE HOWLING)\n\n2\n00:05:47,585 --> 00:05:49,753\n(GUNSHOTS FIRING)\n\n3\n00:06:03,976 --> 00:06:05,310\n(GROANS)\n\n4\n00:10:46,216 --> 00:10:48,718\nYou're from Baker?\n\n5\n00:11:00,230 --> 00:11:02,523\nTell Baker that\nI told him all\nthat I know already.\n\n"

In [94]:
print(shifted_content[:280])

1
00:03:11,512 --> 00:03:13,096
(COYOTE HOWLING)

2
00:05:47,585 --> 00:05:49,753
(GUNSHOTS FIRING)

3
00:06:03,976 --> 00:06:05,310
(GROANS)

4
00:10:46,216 --> 00:10:48,718
You're from Baker?

5
00:11:00,230 --> 00:11:02,523
Tell Baker that
I told him all
that I know already.




In [95]:
shifted_content[-30:]

'\nJust a dirty son of a bitch !'

In [96]:
content[-30:]

'ust a dirty son of a bitch !\n\n'

In [97]:
shifted_content += '\n\n'
shifted_content[-30:]

'ust a dirty son of a bitch !\n\n'

In [98]:
def shift_str_file(filename, delta):
    f = open(filename, 'r')
    content = f.read()
    f.close()
    
    quotes = content.split('\n\n')
    
    shifted_quotes = []
    for quote in quotes[:-1]:
        shifted_quotes.append(shift_quote(quote, delta))
        
    shifted_content = '\n\n'.join(shifted_quotes)
    shifted_content += '\n\n'
    
    return shifted_content

In [99]:
quotes[-2:]

['1223\n02:56:31,038 --> 02:56:34,165\nJust a dirty son of a bitch !', '']

In [100]:
shifted_content = shift_str_file('the-good-the-bad-and-the-ugly-1966.srt', 12)
shifted_content[:280]

"1\n00:03:11,512 --> 00:03:13,096\n(COYOTE HOWLING)\n\n2\n00:05:47,585 --> 00:05:49,753\n(GUNSHOTS FIRING)\n\n3\n00:06:03,976 --> 00:06:05,310\n(GROANS)\n\n4\n00:10:46,216 --> 00:10:48,718\nYou're from Baker?\n\n5\n00:11:00,230 --> 00:11:02,523\nTell Baker that\nI told him all\nthat I know already.\n\n"

In [101]:
len(shifted_content)

80671

### Excercise
Use the other alternatives for reading the file, i.e. with `readlines` or `for line in f`.

## Writing

In [102]:
f = open('SHIFTED-12-the-good-the-bad-and-the-ugly-1966.str', 'w')

In [103]:
f.write(shifted_content)

80671

In [104]:
f.close()

In [105]:
def shift_str_file(filename, delta):
    f = open(filename, 'r')
    content = f.read()
    f.close()
    
    quotes = content.split('\n\n')
    
    shifted_quotes = []
    for quote in quotes[:-1]:
        shifted_quotes.append(shift_quote(quote, delta))
        
    shifted_content = '\n\n'.join(shifted_quotes)
    shifted_content += '\n\n'
    
    shifted_filename = 'SHIFTED-' + str(delta) + '-' + filename
    
    f = open(shifted_filename, 'w')
    f.write(shifted_content)
    f.close()

In [106]:
shift_str_file('the-good-the-bad-and-the-ugly-1966.srt', 20)

## `with`

In [107]:
with open('the-good-the-bad-and-the-ugly-1966.srt', 'r') as f:
    content = f.read()
    
print(content[:280])

1
00:02:59,512 --> 00:03:01,096
(COYOTE HOWLING)

2
00:05:35,585 --> 00:05:37,753
(GUNSHOTS FIRING)

3
00:05:51,976 --> 00:05:53,310
(GROANS)

4
00:10:34,216 --> 00:10:36,718
You're from Baker?

5
00:10:48,230 --> 00:10:50,523
Tell Baker that
I told him all
that I know already.




In [108]:
def shift_str_file(filename, delta):
    with open(filename, 'r') as f:
        content = f.read()
    
    quotes = content.split('\n\n')
    
    shifted_quotes = []
    for quote in quotes[:-1]:
        shifted_quotes.append(shift_quote(quote, delta))
        
    shifted_content = '\n\n'.join(shifted_quotes)
    shifted_content += '\n\n'
    
    shifted_filename = 'SHIFTED-' + str(delta) + '-' + filename
    
    with open(shifted_filename, 'w') as f:
        f.write(shifted_content)