# The Python Language

Python is an interpreted language, unline C, C++ or Java, which are compiled languages.

Ahead of time (AOT) compiled languages are compiled into OS-specific bytecode. Once compiled, the bytecote is encapsulated in an executable, which contains instructions that the OS and chipset understand.

Python on the other hand is interpreted at runtime i.e. when you do `$ python file.py`. The CPython interpreter compiles the Python code into bytecode, which is then run on the machine. The CPython interpreter is the most widely interpreter, written in C. Other interpreters include PyPy, which is a Just in time (JIT) compiler, written in RPython, a restricted subset of the Python language.

Python is a general purpose programming language, whose popularity has soared over the last decade. Python is widely used in Scientific Computing, Data Science and Web Development. Numpy, Scipy, Pandas, Matplotlib, scikit-learn, PyTorch are a few popular packages used in Scientific Computing and Data Science. Django, Flask, Pyramid and Tornado are a few popular packages used in Web Development.

Without further ado, let's start learning the language.

## Introduction

This workbook is going to introduce various concepts in the Python language while solving a very basic problem - reading the contents of a file and analyzing it. Note that all of the code is written in Python 3.6 and some parts of the code will not work with older versions of Python.

In the course of solving the problem, we are going to look at some of the data types, a few in-built functions and how loops/conditionals are written in the Python language.

NOTE : Before you go any further, make sure the `basics_dataset.txt` file exists in your current directory. If it doesn't, run the `create_dataset.py` script using `$ python create_dataset.py`.

### Opening files using Python

Here we define a few constants that will used for reading the contents of the file.

In [1]:
LINE_DELIMITER = '\n'
COMMENT_CHARACTER = '#'
DELIMITER = ','

`open` is an in-built function that lets you read the contents of a file. The `open` function returns a Python object, which contains methods to read the contents of the file. See more at the [Python documentation](https://docs.python.org/3/library/functions.html#open) for the `open` function.

First, let's take a peek at the contents of the file we are working with.

In [2]:
!cat basics_dataset.txt

# Sample Dataset 
0,100,22,77,48,82,26,54,8,56,56
1,76,85,14,38,50,62,73,73,9,93
2,73,49,22,46,0,41,78,57,2,92
3,16,62,44,13,62,100,66,49,20,51
4,48,73,2,58,99,79,64,46,23,4
5,30,55,17,9,59,32,51,60,57,70
6,41,49,3,69,59,12,77,30,69,56
7,35,84,46,9,47,62,12,61,59,87
8,18,12,66,73,15,53,59,32,36,96
9,54,2,7,3,9,66,4,28,16,84


As you can see above, the file contains a comment line at the top, using the `#` character to represent commented lines. The rest of the lines contain integers that are seperated using the `,` character. There are 10 rows and 11 columns of integer data.

In [3]:
file_obj = open("basics_dataset.txt", mode='r')

`help` is also an in-built function that provides easy access to documentation related to the function/object.

In [4]:
help(open)

Help on built-in function open in module io:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
    Open file and return a stream.  Raise IOError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position

`type` is also an in-built function that lets you look at the Python object type. See more in the [docs](https://docs.python.org/3/library/functions.html#type)

In [5]:
type(file_obj)

_io.TextIOWrapper

In [6]:
type(open)

builtin_function_or_method

Coming back to the Python object returned by the `open` function, let's see how we can read the contents of the file from the Python object.

`dir` is also a built-in function that returns the various objects and methods defined on a Class instance. If called on a module, it returns the variables, functions and classes defined in the Python module. See more in the [docs](https://docs.python.org/3/library/functions.html#dir)

In [7]:
dir(file_obj)

['_CHUNK_SIZE',
 '__class__',
 '__del__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_checkClosed',
 '_checkReadable',
 '_checkSeekable',
 '_checkWritable',
 '_finalizing',
 'buffer',
 'close',
 'closed',
 'detach',
 'encoding',
 'errors',
 'fileno',
 'flush',
 'isatty',
 'line_buffering',
 'mode',
 'name',
 'newlines',
 'read',
 'readable',
 'readline',
 'readlines',
 'seek',
 'seekable',
 'tell',
 'truncate',
 'writable',
 'write',
 'writelines']

Looking at the methods defined on the File-like object, a few methods that seem relevant are `read`, `readline` and `readlines`. Let's take a look at what they return when called.

The `read` method of the File-like object returns the entire contents of the file as a Python string object. See more about the Python `string` object in the [docs](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str).

In [8]:
file_obj.read()

'# Sample Dataset \n0,100,22,77,48,82,26,54,8,56,56\n1,76,85,14,38,50,62,73,73,9,93\n2,73,49,22,46,0,41,78,57,2,92\n3,16,62,44,13,62,100,66,49,20,51\n4,48,73,2,58,99,79,64,46,23,4\n5,30,55,17,9,59,32,51,60,57,70\n6,41,49,3,69,59,12,77,30,69,56\n7,35,84,46,9,47,62,12,61,59,87\n8,18,12,66,73,15,53,59,32,36,96\n9,54,2,7,3,9,66,4,28,16,84\n'

Once the `read` method is called and the contents of the file are read, the cursor is set to the end of the file. To start reading the file contents from the beginning, we need to call the `seek` method on the file-like Python object. We call the `seek` method with `0` as an argument to move the cursor to the beginning of the file.

In [9]:
file_obj.seek(0)

0

Now, let's assign the string returned by the `read` method to a variable for further analysis.

In [10]:
text_data = file_obj.read()

Once you're done reading the contents of file-like object, make sure you close the access to the file by calling the `close` method on the object.

In [11]:
file_obj.close()

If you don't want to worry about forgetting to close a file-like object, you can use the `with` statement in Python. See below for an example of how the `with` statement can be used. For more on the `with` statement in Python, see [docs](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement)

In [12]:
with open("basics_dataset.txt", mode='r') as file_obj:
    text_data = file_obj.read()

`print` is also an in-built function. See more in the [docs](https://docs.python.org/3/library/functions.html#print)

In [13]:
print(text_data)

# Sample Dataset 
0,100,22,77,48,82,26,54,8,56,56
1,76,85,14,38,50,62,73,73,9,93
2,73,49,22,46,0,41,78,57,2,92
3,16,62,44,13,62,100,66,49,20,51
4,48,73,2,58,99,79,64,46,23,4
5,30,55,17,9,59,32,51,60,57,70
6,41,49,3,69,59,12,77,30,69,56
7,35,84,46,9,47,62,12,61,59,87
8,18,12,66,73,15,53,59,32,36,96
9,54,2,7,3,9,66,4,28,16,84



As mentioned earlier, the value returned by the `read` method on the file-like object, represented as the `text_data` variable, is a Python `String` data type object.

In [14]:
print(type(text_data))

<class 'str'>


To be able to perform any analysis on the text data, we first need to split up the string into individual rows and then individual columns. We can make use of some of the methods on a `String` type object to achieve the desirable transformations.

Let's use the `dir` in-built function again to take a look at the methods available on the String object.

In [15]:
dir(text_data)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

Of the methods defined above, we are going to make use of the `split` and `startswith` methods.

In [16]:
row_wise_text_data = text_data.split(LINE_DELIMITER)

As you can see below, the `split` method on a String object can be used to split the string into sub-strings using a character. In this case, we pass the `LINE_DELIMITER` character, which was set to be `\n` at the top of the file. See more about the `split` method in the [docs](https://docs.python.org/3/library/stdtypes.html#str.split).

If the `LINE_DELIMITER` character is found in the string, the `split` method will return a `List` of sub-strings. Read more about the `List` type in the [docs](https://docs.python.org/3/library/stdtypes.html#list)

In [17]:
print(row_wise_text_data)

['# Sample Dataset ', '0,100,22,77,48,82,26,54,8,56,56', '1,76,85,14,38,50,62,73,73,9,93', '2,73,49,22,46,0,41,78,57,2,92', '3,16,62,44,13,62,100,66,49,20,51', '4,48,73,2,58,99,79,64,46,23,4', '5,30,55,17,9,59,32,51,60,57,70', '6,41,49,3,69,59,12,77,30,69,56', '7,35,84,46,9,47,62,12,61,59,87', '8,18,12,66,73,15,53,59,32,36,96', '9,54,2,7,3,9,66,4,28,16,84', '']


In [18]:
print(type(row_wise_text_data))

<class 'list'>


Now that we have split the file contents into rows, represented as the individual strings in the list, let's break up each row into columns. To do so, we can loop over each row (string) in the list and split it up into columns (strings). Let's look at how we can use the `for` loop in Python to achieve this.

One way to iterate over the list elements can be seen below

In [19]:
nrows = len(row_wise_text_data)
for i in range(nrows):
    row = row_wise_text_data[i]
    # do something to row/string data.

But there's a better way to loop over the contents of the list of rows in Python. The list object in Python is what is called an iterable. Iterables in Python make it easy to iterate over them in loops. A more Pythonic way of iterating over the rows/elements in the list can be seen below

In [20]:
for row in row_wise_text_data:
    # comments in Python code start with the `#` character
    # in the ith iteration of the for loop, the variable `row` refers
    # to the ith elements in the list row_wise_text_data.
    # e.g. in the first iteration, i=0 and row refers to the first element in the list.
    # e.g. in the third iteration, i=2 and row refers to the third element in the list.
    if row.startswith(COMMENT_CHARACTER):
        continue
    elif row == "":
        continue
    else:
        col_wise_text_data = row.split(DELIMITER)
        print(col_wise_text_data)

['0', '100', '22', '77', '48', '82', '26', '54', '8', '56', '56']
['1', '76', '85', '14', '38', '50', '62', '73', '73', '9', '93']
['2', '73', '49', '22', '46', '0', '41', '78', '57', '2', '92']
['3', '16', '62', '44', '13', '62', '100', '66', '49', '20', '51']
['4', '48', '73', '2', '58', '99', '79', '64', '46', '23', '4']
['5', '30', '55', '17', '9', '59', '32', '51', '60', '57', '70']
['6', '41', '49', '3', '69', '59', '12', '77', '30', '69', '56']
['7', '35', '84', '46', '9', '47', '62', '12', '61', '59', '87']
['8', '18', '12', '66', '73', '15', '53', '59', '32', '36', '96']
['9', '54', '2', '7', '3', '9', '66', '4', '28', '16', '84']


Above we also see the use of the `startswith` method on a string object in Python. Instead of explicitly checking whether the beginning of the string matches a specific character, we simply use the `startswith` method. See more in the Python [docs](https://docs.python.org/3/library/stdtypes.html#str.startswith)

Finally, as you can see, we split up the string representing each row into sub-strings representing the columns in a string.

In [21]:
print(type(col_wise_text_data))

<class 'list'>


Now, let's store all of the data into a 2D list.

In [22]:
dataset = []
for row in row_wise_text_data:
    if row.startswith(COMMENT_CHARACTER):
        continue
    elif row == "":
        continue
    else:
        col_wise_text_data = row.split(DELIMITER)
        parsed_row = []
        for col in col_wise_text_data:
            parsed_col = int(col)
            parsed_row.append(parsed_col)
        dataset.append(parsed_row)

We made use of the `append` method on the list object to store each row of data in a single list object. Each row in itself is represented as a list of integers. `append` is one of the methods available on a List object. A list of a mutable sequence in Python and you can read more about the other methods available on a mutable sequence in the [docs](https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types)

Also note the use of the `int` built-in function. The `int` function simply tries converting the input into a Python integer type object. See more in the [docs](https://docs.python.org/3/library/functions.html#int)

Finally, let's use a list comprehension to remove the second for loop over the column values in a row. A list comprehension is a concise way to create lists. See more in the [docs](https://docs.python.org/3/tutorial/datastructures.html?highlight=list%20comprehension#list-comprehensions)

In [23]:
dataset = []
for row in row_wise_text_data:
    if row.startswith(COMMENT_CHARACTER):
        continue
    elif row == "":
        continue
    else:
        col_wise_text_data = row.split(DELIMITER)
        dataset.append(
            [int(col_value) for col_value in col_wise_text_data]
        )

Finally, we have parsed the contents of the file into a 2D list where the first index represents each row in the file and the second index represents each column in the row

In [24]:
dataset

[[0, 100, 22, 77, 48, 82, 26, 54, 8, 56, 56],
 [1, 76, 85, 14, 38, 50, 62, 73, 73, 9, 93],
 [2, 73, 49, 22, 46, 0, 41, 78, 57, 2, 92],
 [3, 16, 62, 44, 13, 62, 100, 66, 49, 20, 51],
 [4, 48, 73, 2, 58, 99, 79, 64, 46, 23, 4],
 [5, 30, 55, 17, 9, 59, 32, 51, 60, 57, 70],
 [6, 41, 49, 3, 69, 59, 12, 77, 30, 69, 56],
 [7, 35, 84, 46, 9, 47, 62, 12, 61, 59, 87],
 [8, 18, 12, 66, 73, 15, 53, 59, 32, 36, 96],
 [9, 54, 2, 7, 3, 9, 66, 4, 28, 16, 84]]

In [25]:
# print all columns from the first row of data
dataset[0][:]

[0, 100, 22, 77, 48, 82, 26, 54, 8, 56, 56]

In [26]:
# print columns 2-5 from the third row of data
dataset[2][1:4]

[73, 49, 22]

In [27]:
# print the last column from the last row of data
print(dataset[9][10])

# or
print(dataset[-1][-1])

84
84


# Summary

In summary, we covered the `int`, `string` and `list` types in Python. We also took a look at how loops and conditionals are used in Python. Finally, we used a few in-built functions from the Python standard library.