To open a file in Python, you need the open()-function, which returns a file object as its value. It
is normally called with two arguments:


The four most important modes are the following:

• ’r’ - The file is opened for reading. Since reading from a file is considered a standard operation,
this specification can be omitted.

• ’w’ - The file is opened for writing. If a file with the specified name already exists, it will be
overwritten.

• ’a’ - The file is opened for writing. If a file with the specified name already exists, the new
file is appended to the end of the file.

• ’x’ - The file is created. If it already exists, this leads to an error message.


In [None]:
open(FILE NAME, MODE)

•'t': This stands for text mode. When you open a file in text mode ('t'), Python will read or write the file as a text file. It means that the file is expected to contain human-readable text, and Python will handle newline characters (\n) in a platform-specific way (converting them to the appropriate newline character(s) for the operating system).

•'b': This stands for binary mode. When you open a file in binary mode ('b'), Python will read or write the file as a binary file. Binary mode is used when dealing with non-text files or when you want to handle binary data explicitly. In binary mode, newline characters are not automatically modified, and the data is read or written as-is.


In [None]:
with open('example.txt', 'rt') as file:
    content = file.read()

with open('example.bin', 'rb') as file:
    content = file.read()

    

In [None]:
open('test', mode='w+', encoding=`utf-8'). #Opened for reading & writing

When opening a file with open(), make sure that it is closed again after the intended operations on
the file.

In [None]:
f = open('my_file.txt') # Text file is opened for reading
<PROCESSING OF DATA>
f.close() # The close() method closes the file or stream.


This process can be simplified by using a with form:In this case, the file is automatically closed immediately after the block is finished. You should always work with with-forms if possible.

In [None]:
with open('my\_file.txt') as f:
<PROCESSING OF DATA>

A file-object created by open() has a number of properties that can be accessed with suitable
methods. The following methods can be applied to a file-object f_obj:
• f_obj.closed evaluates to True gdw. the file was closed; otherwise to False.
• f_obj.mode returns the name of the mode that was used when the file was opened.
• f_obj.name returns the name of the associated file.

In [None]:
str = open("test", 'x')
str.closed
[ ]: False


[ ]: str.close()
str.closed
[ ]: True


[ ]: str.name
[ ]: 'test'

[ ]: str.mod
'x'



In [None]:
with open('my_file.txt') as f:
    content = f.read()

In this example, the open() function is used to open the file named 'my_file.txt'. The with statement ensures that the file is properly closed after reading its content. The read() method reads the entire content of the file and stores it in the variable content.

In [None]:
with open('my_file.txt') as f:
    while True:
        char = f.read(1) # Read the next character from the file
        if not char: break

When reaching the end of the file, read(n) returns an empty string. Contrary to expectations, in
this situation further attempts to read do not lead to an error message, but also evaluate to an
empty string.


In [None]:
with open('my_file.txt') as f:
    for line in f:
    print(line.rstrip())

It is even easier to process a file line by line:
The rstrip() method is applied to each line to remove any newline characters. Then, the modified line is printed to the console.

Difference between read(), readline() and readlines()

If readline() is used with a numeric argument n, a maximum of n characters are read, as with read(); however, unlike read(), it never reads beyond the end of the line.


Again, it is also possible to pass a numerical argument (default value: -1 / reads all lines of thefile): All lines are then read including the line containing the specified byte/character:



In [None]:
with open("text_sample.txt") as file:
    print(file.readlines(7)) 

In [None]:
with open("text_sample.txt") as file:
    print(file.read(18))

with open("text_sample.txt") as file:
    print(file.readline(18))

with open("text_sample.txt") as file:
    print(file.readlines())

tell() and seek()

The tell()-method, which is called without arguments, can be used to determine the current position
of the file pointer.

datei = open("text_sample.txt")
datei.read(4)
'Only'

datei.tell()
4

The seek()-method accepts up to two numeric arguments (INTEGER): the value
of the first argument determines the number of bytes by which the file pointer is to be moved; the
value of the second, optional argument defines the starting point for this action

datei.seek(7)
 7





write() and writelines()
write() takes a string and writeline() takes a sequence (list, tuple) of strings as an argument and writes them to a file:



In [None]:
with open("second_test.txt", "w") as file:
    file.write("I don't like")
    file.writelines([" Mondays", "The Boomtown Rats"])

In [None]:
JSON
JSON stands for JavaScript Object Notation, but it is a (text-based) format that is supported
by many programming languages and thus also enables data exchange between programmes
written in different languages. The functions json.dump() and json.load() are used to save
and load the data:


Pickle

Python’s pickle library allows important types of Python objects to be serialised and deserialised. In contrast to JSON, a binary format is used for this, which means that the file
used for saving must be opened in binary mode (’wb’ or ‘rb’).

Shelve

The shelve module in Python provides a convenient way to persistently store and retrieve Python objects using a dictionary-like interface.

The shelve.open()-method can be used to open a file for reading or writing:

import shelve
with shelve.open('test.db') as f:
    ['test1'] = { 'Integer': 17, 'Float': 3.14, 'String': 'What is this?' }

Import shelve module:

The code begins by importing the shelve module, which provides a simple interface for persistently storing and retrieving Python objects.

Open a shelf file:

The with shelve.open('test.db') as f: line opens a shelf file named 'test.db' in the current working directory.
The with statement is used here to ensure that the shelf file is properly closed after the indented block of code.
Store data in the shelf:

Inside the with block, the line f['test1'] = {'Integer': 17, 'Float': 3.14, 'String': 'What is this?'} stores a dictionary with three key-value pairs into the shelf.
The key is 'test1', and the corresponding value is a dictionary containing an integer, a float, and a string.
Automatic closing of the shelf:

Due to the use of the with statement, the shelf file is automatically closed when the block of code is exited. This ensures that any changes made to the shelf are saved.


OS

The os module in Python provides a way to interact with the operating system, including performing operations that affect the file system. It allows you to perform various tasks such as file and directory manipulation, process management, and more.

import os
The most important methods include the following:
os.rename( "second_test.txt", "murx.txt" ) # Rename a file.
os.remove("murx.txt") # delete a file
os.getcwd() # display the current directory
os.mkdir('pictures') # create a new directory
os.chdir('pictures') # Change directory
os.rmdir('pictures') # Delete directory
os.listdir('..') # show contents of a directory




pathlib library: Using this library, for example, you can change the extension of a file.






In [None]:
rom pathlib import Path
original_file_name = 'star_wars.txt'
new_file_name = Path(original_file_name).stem + '.pdf'
new_file_name


Tokenization

from nltk.tokenize import word_tokenize
tokens = word_tokenize(roh_text)
print(tokens[:10])

In the next step, we turn the token list into an instance of NLTK’s Text class in order to be able
to perform further operations on this document: this is an instance of the Text class from the Natural Language Toolkit (NLTK) library in Python. The Text class is part of NLTK's text module and is used for the analysis of text corpora.

text = nltk.Text(tokens)

print(text.collocations())




find()

print(roh_text.find("Faust: Der Tragödie erster Teil",400))

This line is using the find() method on the string roh_text.
The first argument to find() is the substring you are searching for, which is "Faust: Der Tragödie erster Teil" in this case.
The second argument is the starting index for the search, which is set to 400 in your code.

rfind() searches for the last occurence of the string 

roh_text.rfind("*** END OF THE")