# Interact with the Operating System

Viri:
- [How to Build Command Line Interfaces in Python With argparse](https://realpython.com/command-line-interfaces-python-argparse/)
- [Comparing Python Command-Line Parsing Libraries – Argparse, Docopt, and Click](https://realpython.com/comparing-python-command-line-parsing-libraries-argparse-docopt-click/)
- [Command Line Arguments in Python](https://stackabuse.com/command-line-arguments-in-python/)
- [Argparse Tutorial](https://docs.python.org/3/howto/argparse.html)
- [Argument Parsing in Python](https://www.datacamp.com/community/tutorials/argument-parsing-in-python)
- [How to Write Perfect Python Command-line Interfaces](https://www.sicara.ai/blog/2018-12-18-perfect-command-line-interfaces-python)
- [Click](https://click.palletsprojects.com/en/7.x/)
- [Google IT Automation with Python Professional Certificate](https://www.coursera.org/professional-certificates/google-it-automation)
- [Working With Files in Python](https://realpython.com/working-with-files-in-python/)
- [Python Input and Output to Handle User and File Input](https://pynative.com/python-input-function-get-user-input/)
- [subprocess — Spawning Additional Processes](https://pymotw.com/3/subprocess/)

## Command-line Interfaces

Computer programs are written with a specific purpose in mind. Tools on UNIX/Linux systems follow the idea of specialization - one tool for one task, but to do it as perfect as possible, then. Nethertheless, as with other tools, this allows you to combine single programs, and to create powerful tool chains.
 
With the help of command line arguments that are passed to programs, you can deal with much more specific use cases. Command line arguments allow you to enable programs to act in a certain way, for example to output additional information, or to read data from a specified source, and to interpret this data in a desired format.

In general, operating systems accept arguments in a certain notation, for example:
- UNIX: "-" followed by a letter, like "-h"
- GNU: "--" followed by a word, like "--help"
- Microsoft Windows: "/" followed by either a letter, or word, like "/help"

These different approaches exist due to historical reasons. Many programs on UNIX-like systems support either the UNIX way, or the GNU way, or both. The UNIX notation is mostly used with single letter options while GNU presents a more readable options list particularly useful to document what is running.

Keep in mind that both the name and the meaning of an argument are specific to a program - there is no general definition, but a few conventions like --help for further information on the usage of the tool. As the developer of a Python script you decide which arguments are valid, and what they stand for, actually. This requires proper evaluation. Read on how to do it using Python.

### Handling command line arguments with Python

Python 3 supports four different ways of handling command line arguments. The oldest one is the sys module. In terms of names, and its usage, it relates directly to the C library (libc). The second way is the getopt module which handles both short, and long options, including the evaluation of the parameter values.

Furthermore, two lesser-known ways exist. This is the argparse module which is derived from the optparse module available up to Python 2.7, formerly, and the docopt module is available from GitHub. All the modules are fully documented and worth reading.

#### The sys Module

This is a basic module that was shipped with the Python distribution from the early days on. It has a quite similar approach as the C library using argc/argv to access the arguments. The sys module implements the command line arguments in a simple list structure named `sys.argv`.

Each list element represents a single argument. The first one -- sys.argv[0] -- is the name of the Python script. The other list elements -- sys.argv[1] to sys.argv[n] -- are the command line arguments 2 to n. As a delimiter between the arguments, a space is in use. Argument values that contain a space in it have to be quoted, accordingly.

The equivalent of argc is just the number of elements in the list. To obtain this value use the Python len() operator. Example 2 will explain this in detail.

In [None]:
# Osnovni primer delovanja
import sys

print(sys.argv)

> Probamo poklicati z različnimi argumenti.

##### Example 1: Determine the name of the Python script

In this first example, we determine the way we were called. This information is kept in the first command line argument, indexed with 0. Listing 1 displays how you obtain the name of your Python script.

In [3]:
import sys

print (f"the script has the name {sys.argv[0]}")

the script has the name /opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py


Save this code in a file named arguments-programname.py, and then call it as shown in Listing 1. The output is as follows and contains the file name, including its full path.

    $ python arguments-programname.py
    the script has the name arguments-programname.py
    
    $ python /home/user/arguments-programname.py
    the script has the name /home/user/arguments-programname.py

##### Example 2: Count the arguments

In the second example we simply count the number of command line arguments using the built-in len() method. sys.argv is the list that we have to examine. In Example 2, a value of 1 is subtracted to get the right index (argument list counters start from zero). As you may remember from Example 1, the first element contains the name of the Python script, which we skip here.

In [2]:
import sys

# count the arguments
arguments = len(sys.argv) - 1
print (f"the script is called with {arguments} arguments")

the script is called with 2 arguments


Save and name this file arguments-count.py. The call is displayed in Listing 2. This includes three different scenarios: a) a call without any further command line arguments, b) a call with two arguments, and c) with two arguments where the second one is a quoted string (a string that contains a space).

    $ python arguments-count.py
    the script is called with 0 arguments

    $ python arguments-count.py --help me
    the script is called with 2 arguments

    $ python arguments-count.py --option "long string"
    the script is called with 2 arguments

##### Example 3: Output arguments

The third example outputs every single argument the Python script that is called with, except the program name itself. Therefore, we loop through the command line arguments starting with the second list element. As stated before, this element has index 1.

In [5]:
import sys

# count the arguments
arguments = len(sys.argv) - 1

# output argument-wise
position = 1
while (arguments >= position):
    print (f"parameter {position}: {sys.argv[position]}")
    position = position + 1

parameter 1: -f
parameter 2: /home/jovyan/.local/share/jupyter/runtime/kernel-4afff967-7986-49dc-afb4-3e7b5e8e5c6c.json


In Listing 3 the Python script is named arguments-output.py. As done in Listing 2, the output illustrates three different calls: a) without any arguments, b) with two arguments, and c) also with two arguments where the second argument is a quoted string that consists of two single words, separated by a space.

    $ python arguments-output.py

    $ python arguments-output.py --help me
    parameter 1: --help
    parameter 2: me

    $ python arguments-output.py --option "long string"
    parameter 1: --option
    parameter 2: long string

#### The getopt Module

> Note The getopt module is a parser for command line options whose API is designed to be familiar to users of the C getopt() function. Users who are unfamiliar with the C getopt() function or who would like to write less code and get better help and error messages should consider using the argparse module instead.

As you may have seen before the sys module splits the command line string into single facets only. The Python getopt module goes a bit further, and extends the separation of the input string by parameter validation. Based on the getopt C function, it allows both short, and long options including a value assignment.

In practice it requires the sys module to process input data properly. To do so, both the sys module and the getopt module have to be loaded beforehand. Next, from the list of input parameters we remove the first list element (see Example 4.1), and store the remaining list of command line arguments in the variable called argumentList.

#### Example 4

In [None]:
# Preparing the input parameters
# include standard modules
import getopt, sys

# read commandline arguments, first
fullCmdArguments = sys.argv

# - further arguments
argumentList = fullCmdArguments[1:]

print(argumentList)

Now, argumentList can be parsed using the getopts() method. Before doing that, getopts() needs to know about the valid parameters. They are defined like this:

In [None]:
unixOptions = "ho:v"
gnuOptions = ["help", "output=", "verbose"]

This means that these arguments are seen as the valid ones, now:

    ------------------------------------------
    long argument   short argument  with value
    ------------------------------------------
    --help           -h              no
    --output         -o              yes
    --verbose        -v              no
    ------------------------------------------
    
Next, this allows you to process the argument list. The getopt() method requires three parameters - the list of the remaining arguments, as well as both the valid UNIX, and GNU options (see table above).

The method call itself is kept in a try-catch-statement to cover errors during the evaluation. An exception is raised if an argument is discovered that is not part of the list as defined before (see Example 4.2). The Python script will print the error message to the screen, and exit with error code 2.

In [None]:
try:
    arguments, values = getopt.getopt(argumentList, unixOptions, gnuOptions)
except getopt.error as err:
    # output error, and return with an error code
    print(str(err))
    sys.exit(2)

Finally, the arguments with the corresponding values are stored in the two variables named arguments, and values. Now, you can evaluate these variables (see Example 4.4). The for-loop goes through the list of recognized arguments, one entry after the next.

In [None]:
# evaluate given options
for currentArgument, currentValue in arguments:
    if currentArgument in ("-v", "--verbose"):
        print("enabling verbose mode")
    elif currentArgument in ("-h", "--help"):
        print("displaying help")
    elif currentArgument in ("-o", "--output"):
        print(f"enabling special output mode {currentValue}")

In Listing 4 you see the output of the program calls. These calls are displayed with both valid and invalid program arguments.

    $ python arguments-getopt.py -h
    displaying help
    
    $ python arguments-getopt.py --help
    displaying help
    
    $ python arguments-getopt.py --output=green --help -v
    enabling special output mode (green)
    displaying help
    enabling verbose mode
    
    $ python arguments-getopt.py -verbose
    option -e not recognized

#### The argparse Library

> The argparse module makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.

[argparse — Parser for command-line options, arguments and sub-commands](https://docs.python.org/3/library/argparse.html)

One of the strengths of Python is that it comes with batteries included: it has a rich and versatile standard library that makes it one of the best programming languages for writing scripts for the command line. But, if you write scripts for the command line, then you also need to provide a good command line interface, which you can create with the Python argparse library.

The Python argparse library:
- Allows the use of positional arguments
- Allows the customization of the prefix chars
- Supports variable numbers of parameters for a single option
- Supports subcommands (A main command line parser can use other command line parsers depending on some arguments.)

In order to familiarize yourself with this topic, you’re going to read a lot about arguments, options, and parameters, so let’s clarify the terminology right away:
- An argument is a single part of a command line, delimited by blanks.
- An option is a particular type of argument (or a part of an argument) that can modify the behavior of the command line.
- A parameter is a particular type of argument that provides additional information to a single option or command.

Using the Python argparse library has four steps:
- Import the Python argparse library
- Create the parser
- Add optional and positional arguments to the parser
- Execute .parse_args()

After you execute .parse_args(), what you get is a Namespace object that contains a simple property for each input argument received from the command line.

In order to see these four steps in detail with an example, let’s pretend you’re creating a program named myls.py that lists the files contained in the current directory. Here’s a possible implementation of your command line interface without using the Python argparse library:

In [None]:
# myls.py
import os
import sys

if len(sys.argv) > 2:
    print('You have specified too many arguments')
    sys.exit()

if len(sys.argv) < 2:
    print('You need to specify the path to be listed')
    sys.exit()

input_path = sys.argv[1]

if not os.path.isdir(input_path):
    print('The path specified does not exist')
    sys.exit()

print('\n'.join(os.listdir(input_path)))

This is a possible implementation of the command line interface for your program that doesn’t use the Python argparse library.

As you can see, the script does work, but the output is quite different from the output you’d expect from a standard built-in command.

Now, let’s see how the Python argparse library can improve this code:

In [None]:
# myls_argp.py
# Import the argparse library
import argparse

import os
import sys

# Create the parser
# description: for the text that is shown before the help text
my_parser = argparse.ArgumentParser(description='List the content of a folder')

# Add the arguments
my_parser.add_argument('Path',
                       metavar='path',
                       type=str,
                       help='the path to list')

# Execute the parse_args() method
args = my_parser.parse_args()

input_path = args.Path

if not os.path.isdir(input_path):
    print('The path specified does not exist')
    sys.exit()

print('\n'.join(os.listdir(input_path)))

The code has changed a lot with the introduction of the Python argparse library.

The first big difference compared to the previous version is that the if statements to check the arguments provided by the user are gone because the library will check the presence of the arguments for us.

We’ve imported the Python argparse library, created a simple parser with a brief description of the program’s goal, and defined the positional argument we want to get from the user. Lastly, we have executed .parse_args() to parse the input arguments and get a Namespace object that contains the user input.

    python myls_argp.py

As you can see, the program has detected that you needed at least a positional argument (path), and so the execution of the program has been interrupted with a specific error message.

You may also have noticed that now your program accepts an optional -h flag, like in the example below:

    python myls_argp.py -h

Good, now the program responds to the -h flag, displaying a help message that tells the user how to use the program. Isn’t that neat, considering that you didn’t even need to ask for that feature?

Lastly, with just four lines of code, now the args variable is a Namespace object, which has a property for each argument that has been gathered from the command line.

##### Setting the Name of the Program

By default, the library uses the value of the sys.argv[0] element to set the name of the program, which as you probably already know is the name of the Python script you have executed. However, you can specify the name of your program just by using the prog keyword:



In [None]:
# Create the parser
my_parser = argparse.ArgumentParser(prog='myls',
                                    description='List the content of a folder')

As you can see, now the program name is just myls instead of myls.py.

##### Setting the Name or Flags of the Arguments

There are basically two different types of arguments that you can add to your command line interface:

Positional arguments
- Optional arguments
- Positional arguments are the ones your command needs to operate.

In the previous example, the argument path was a positional argument, and our program couldn’t work without it. They are called positional because their position defines their function.

For example, consider the cp command on Linux (or the copy command in Windows). Here’s the standard usage:

    cp [OPTION]... [-T] SOURCE DEST

The first positional argument after the cp command is the source of the file you’re going to copy. The second one is the destination where you want to copy it.

Optional arguments are not mandatory, and when they are used they can modify the behavior of the command at runtime. In the cp example, an optional argument is, for example, the -r flag, which makes the command copy directories recursively.

Syntactically, the difference between positional and optional arguments is that optional arguments start with - or --, while positional arguments don’t.

To add an optional argument, you just need to call .add_argument() again and name the new argument with a starting -.


##### Name of the Attribute to Be Added to the Object Once Parsed

As you have already seen, when you add an argument to the parser, the value of this argument is stored in a property of the Namespace object. This property is named by default as the first argument passed to .add_argument() for the positional argument and as the long option string for optional arguments.

If an option uses dashes (as is fairly common), they will be converted to underscores in the property name:

However, it’s possible to specify the name of this property just by using the keyword dest when you’re adding an argument to the parser:

    my_parser.add_argument('-v',
                           '--verbosity',
                           action='store',
                           type=int,
                           dest='my_verbosity_level')

By running this program, you’ll see that now the args variable contains a .my_verbosity_level property, even if by default the name of the property should have been .verbosity:

The default name of this property would have been .verbosity, but since a different name has been specified by the dest keyword, .my_verbosity_level has been used.

> A verbose mode is an option available in many computer operating systems, including Microsoft Windows, macOS, and Linux. It provides additional details as to what the computer is doing and what drivers and software it is loading during startup. This level of detail can be very helpful for troubleshooting problems with hardware or software, if errors are occurring during startup or after the operating system has loaded. Below is an example of verbose output in a Windows command line screen.

##### Setting the Argument Name in Usage Messages

If an argument accepts an input value, it can be useful to give this value a name that the parser can use to generate the help message, and this can be done by using the metavar keyword. In the following example, you can see how you can use the metavar keyword to specify a name for the value of the -v flag:

In [None]:
# metavar_example.py
import argparse

my_parser = argparse.ArgumentParser()
my_parser.add_argument('-v',
                       '--verbosity',
                       action='store',
                       type=int,
                       metavar='LEVEL')

args = my_parser.parse_args()

print(vars(args))

Now, if you run your program with the -h flag, the help text assigns the name LEVEL to the value of the -v flag:

Please note that, in the help message, the value accepted for the -v flag is now named LEVEL.

##### [Defining Mutually Exclusive Groups](https://realpython.com/command-line-interfaces-python-argparse/#defining-mutually-exclusive-groups)

##### Showing a Brief Description of What an Argument Does

A great feature of the Python argparse library is that, by default, you have the ability to ask for help just by adding the -h flag to your command line.

To make it even better, you can add help text to your arguments, so as to give the users even more help when they execute your program with the -h flag:

In [None]:
import argparse

my_parser = argparse.ArgumentParser()
my_parser.add_argument('-a',
                       action='store',
                       choices=['head', 'tail'],
                       help='set the user choice to head or tail')

args = my_parser.parse_args()

print(vars(args))

Defining a help text for all the arguments is a really good idea because it makes the usage of your program more clear to the user.

##### Setting Whether the Argument Is Required

If you want to force your user to specify the value for an optional argument, then you can use the required keyword:

In [None]:
# required_example.py
import argparse

my_parser = argparse.ArgumentParser()
my_parser.add_argument('-a',
                       action='store',
                       choices=['head', 'tail'],
                       required=True)

args = my_parser.parse_args()

print(vars(args))

If you use the required keyword set to True for an optional argument, then the user will be forced to set a value for that argument.

That said, please bear in mind that requiring an optional argument is usually considered bad practice since the user wouldn’t expect to have to set a value for an argument that should be optional.

##### Setting a Domain of Allowed Values for a Specific Argument

Another interesting possibility with the Python argparse library creating a domain of allowed values for specific arguments. You can do this by providing a list of accepted values while adding the new option:

In [None]:
# choices_ex.py
import argparse

my_parser = argparse.ArgumentParser()
my_parser.add_argument('-a', action='store', choices=['head', 'tail'])

args = my_parser.parse_args()

Please note that if you are accepting numeric values, then you can even use range() to specify a range of accepted values:

In [None]:
# choices_ex.py
import argparse

my_parser = argparse.ArgumentParser()
my_parser.add_argument('-a', action='store', type=int, choices=range(1, 5))

args = my_parser.parse_args()

print(vars(args))

In this case, the value provided on the command line will be automatically checked against the range defined. If the input number is outside the defined range, then you’ll get an error message.

##### Setting the Type of the Argument

By default, all the input argument values are treated as if they were strings. However, it’s possible to define the type for the corresponding property of the Namespace object you get after .parse_args() is invoked just by defining it with the type keyword like this:



In [None]:
# type_example.py
import argparse

my_parser = argparse.ArgumentParser()
my_parser.add_argument('-a', action='store', type=int)

args = my_parser.parse_args()

print(vars(args))

Specifying the int value for the argument, you are telling argparse that the .a property of your Namespace object has to be an int (instead of a string):

    python type_example.py -a 42

Besides, now the value of the argument is checked at runtime, and if there’s a problem with the type of the value provided at the command line, then the execution is interrupted with a clear error message:

    python type_example.py -a "that's a string"

In this case, the error message is very clear because it states that you were expected to pass an int instead of a string.

##### Setting a Default Value Produced if the Argument Is Missing

You already know that the user can decide whether or not to specify optional arguments in the command line. When arguments are not specified, the corresponding value is generally set to None.

However, it is possible to define a default value for an argument when it’s not provided:



In [None]:
# default_example.py
import argparse

my_parser = argparse.ArgumentParser()
my_parser.add_argument('-a', action='store', default='42')

args = my_parser.parse_args()

print(vars(args))

    python default_example.py

You can see that now the option -a is set to 42, even if you didn’t explicitly set the value on the command line.

##### Setting the Action to Be Taken for an Argument

When you add an optional argument to your command line interface, you can also define what kind of action to take when the argument is specified. This means that you usually need to specify how to store the value to the Namespace object you will get when .parse_args() is executed.

There are several actions that are already defined and ready to be used. Let’s analyze them in detail:
- store stores the input value to the Namespace object. (This is the default action.)
- store_const stores a constant value when the corresponding optional arguments are specified.
- store_true stores the Boolean value True when the corresponding optional argument is specified and stores a False elsewhere.
- store_false stores the Boolean value False when the corresponding optional argument is specified and stores True elsewhere.
- append stores a list, appending a value to the list each time the option is provided.
- append_const stores a list appending a constant value to the list each time the option is provided.
- count stores an int that is equal to the times the option has been provided.
- help shows a help text and exits.
- version shows the version of the program and exits.

In [None]:
import argparse

my_parser = argparse.ArgumentParser()
my_parser.version = '1.0'
my_parser.add_argument('-a', action='store')
my_parser.add_argument('-b', action='store_const', const=42)
my_parser.add_argument('-c', action='store_true')
my_parser.add_argument('-d', action='store_false')

args = my_parser.parse_args()

print(vars(args))

##### Primer: ls razširjen

In [None]:
# myls.py
# Import the argparse library
import argparse

import os
import sys

# Create the parser
my_parser = argparse.ArgumentParser(description='List the content of a folder')

# Add the arguments
my_parser.add_argument('Path',
                       metavar='path',
                       type=str,
                       help='the path to list')
my_parser.add_argument('-l',
                       '--long',
                       action='store_true',
                       help='enable the long listing format')

# Execute parse_args()
args = my_parser.parse_args()

input_path = args.Path

if not os.path.isdir(input_path):
    print('The path specified does not exist')
    sys.exit()

for line in os.listdir(input_path):
    if args.long:  # Simplified long listing
        size = os.stat(os.path.join(input_path, line)).st_size
        line = f'{size:10d}  {line}'
    print(line)

### Primer: Write Python Command-line Interfaces

**When to Use a Command Line Interface**

Now that you know what a command line interface is, you may be wondering when it’s a good idea to implement one in your programs. The rule of thumb is that, if you want to provide a user-friendly approach to configuring your program, then you should consider a command line interface, and the standard way to do it is by using the Python argparse library.

Even if you’re creating a complex command line program that needs a configuration file to work, if you want to let your user specify which configuration file to use, it’s a good idea to accept this value by creating a command line interface with the Python argparse library.

As Python developers, we always use and write command-line interfaces. On my Data Science projects, for example, I run several scripts from command-line to train my models and to compute the accuracy of my algorithms.

This is why a good way to improve your productivity is to make your scripts as handy and straightforward as possible, especially when you are several developers working on the same project.

In order to achieve that, I advise you to respect 4 guidelines:
- You should provide default arguments values when possible
- All error cases should be handled (ex: a missing argument, a wrong type, a file not found)
- All arguments and options have to be documented
- A progress bar should be printed for not instantaneous tasks

In [8]:
!head ./data/weblog.csv

IP,Time,URL,Staus
10.128.2.1,[29/Nov/2017:06:58:55,GET /login.php HTTP/1.1,200
10.128.2.1,[29/Nov/2017:06:59:02,POST /process.php HTTP/1.1,302
10.128.2.1,[29/Nov/2017:06:59:03,GET /home.php HTTP/1.1,200
10.131.2.1,[29/Nov/2017:06:59:04,GET /js/vendor/moment.min.js HTTP/1.1,200
10.130.2.1,[29/Nov/2017:06:59:06,GET /bootstrap-3.3.7/js/bootstrap.js HTTP/1.1,200
10.130.2.1,[29/Nov/2017:06:59:19,GET /profile.php?user=bala HTTP/1.1,200
10.128.2.1,[29/Nov/2017:06:59:19,GET /js/jquery.min.js HTTP/1.1,200
10.131.2.1,[29/Nov/2017:06:59:19,GET /js/chart.min.js HTTP/1.1,200
10.131.2.1,[29/Nov/2017:06:59:30,GET /edit.php?name=bala HTTP/1.1,200


In [32]:
import csv

def parse_web_logs(input_file, mode='all', header=True):
    ips = {}
    statuses = {}
    with open(input_file, 'r') as f:
        csv_reader = csv.reader(f, delimiter=',')
        line_count = 0
        for row in csv_reader:
            if header and line_count == 0:
                line_count += 1
            else:
                if row[0].startswith('[') or row[0][0].isalpha():
                    continue
                ip = row[0]
                status = int(row[3])
                ips[ip] = ips.get(ip,0) + 1
                statuses[status] = statuses.get(status,0) + 1
                line_count += 1 
    
    if mode == 'all':
        return {'ips': ips, 'statuses': statuses}
    elif mode == 'ip':
        return {'ips': ips}
    elif mode == 'status':
        return {'statuses': statuses}
    else:
        return None

In [33]:
parse_web_logs('data/weblog.csv', header=True)

{'ips': {'10.128.2.1': 4257,
  '10.131.2.1': 1626,
  '10.130.2.1': 4056,
  '10.129.2.1': 1652,
  '10.131.0.1': 4198},
 'statuses': {200: 11330, 302: 3498, 304: 658, 206: 52, 404: 251}}

Želimo imeti nekaj takega:

    python3 parse_logs.py [-h] [-m MODE] [-no] path

> https://docs.python.org/3.8/library/collections.html#collections.Counter

The first thing our script needs to do is to get the values of command line arguments.

In [None]:
import argparse
import csv
import sys
import os

def parse_web_logs(input_file, mode='all', header=True):
    ips = {}
    statuses = {}
    with open(input_file, 'r') as f:
        csv_reader = csv.reader(f, delimiter=',')
        line_count = 0
        for row in csv_reader:
            if header and line_count == 0:
                line_count += 1
            else:
                if row[0].startswith('[') or row[0][0].isalpha():
                    continue
                ip = row[0]
                status = int(row[3])
                ips[ip] = ips.get(ip,0) + 1
                statuses[status] = statuses.get(status,0) + 1
                line_count += 1 
    
    if mode == 'all':
        print(f'IP: {ips}\nStatus codes: {statuses}')
    elif mode == 'ip':
        print(f'IP: {ips}')
    elif mode == 'status':
        print(f'Status codes: {statuses}')
    else:
        return None

def parse():
    parser = argparse.ArgumentParser(prog='weblogpars',
                                     description='Pars the logs from the web server and sums IPs and status codes.')
    
    parser.add_argument('path',
                       metavar='FILE_PATH',
                       type=str,
                       help='the path to the file to parse')

    parser.add_argument('-m', '--mode', 
                        dest='mode', 
                        metavar='MODE', 
                        action='store', 
                        default='all',  
                        help="select the mode of the parser [all, ip, status]", 
                        choices=['all', 'ip', 'status'])

    parser.add_argument('-no', '--no-header', 
                        dest='no_header', 
                        action='store_false',
                        help='add this if there is no header in log file')
    
    args = parser.parse_args()

    file_path = args.path
    mode = args.mode
    header = args.no_header

    
    if os.path.exists(file_path):
        parse_web_logs(file_path, mode=mode, header=header)
    else:
        print('The path specified does not exist!')
        sys.exit()

if __name__ == '__main__':
    parse()

### Druge knjižnice: Typer

https://typer.tiangolo.com/

    pip install typer

https://typer.tiangolo.com/tutorial/first-steps/

### Druge knjižnice: click

> potrebna dodatna namestitev

Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It’s the “Command Line Interface Creation Kit”. It’s highly configurable but comes with sensible defaults out of the box.

- [Click](https://click.palletsprojects.com/en/7.x/)
- [Welcome to the Click Documentation](https://pocoo-click.readthedocs.io/en/latest/)

In [1]:
import click
import csv
import sys
import os
from tqdm import tqdm

def parse_web_logs(input_file, mode='all', header=True):
    ips = {}
    statuses = {}
    with open(input_file, 'r') as f:
        csv_reader = csv.reader(f, delimiter=',')
        line_count = 0
        for key in tqdm(range(100)):
            for row in csv_reader:
                if header and line_count == 0:
                    line_count += 1
                else:
                    if row[0].startswith('[') or row[0][0].isalpha():
                        continue  
                    ip = row[0]
                    status = int(row[3])
                    ips[ip] = ips.get(ip,0) + 1
                    statuses[status] = statuses.get(status,0) + 1
                    line_count += 1 
    
    if mode == 'all':
        print(f'IP: {ips}\nStatus codes: {statuses}')
    elif mode == 'ip':
        print(f'IP: {ips}')
    elif mode == 'status':
        print(f'Status codes: {statuses}')
    else:
        return None


@click.command()
@click.argument('file_path')
@click.option('-m', '--mode', help="select the mode of the parser [all, ip, status]", default='all',  metavar='MODE', type=click.Choice(['all', 'ip', 'status']))
@click.option('--header/--no-header', help='add this if there is no header in log file', default=True)
def parse(file_path, mode, header):
    if os.path.exists(file_path):
        parse_web_logs(file_path, mode=mode, header=header)
    else:
        click.echo('The path specified does not exist!')
        sys.exit()
        

if __name__ == '__main__':
    parse()

Usage: ipykernel_launcher.py [OPTIONS] FILE_PATH
Try "ipykernel_launcher.py --help" for help.

Error: no such option: -f


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


## Input function to accept input from a user

The input() function reads a line entered on a console by an input device such as a keyboard and convert it into a string and returns it. You can use this input string in your python code.

### Python example to accept input from a user

Let see how to accept employee data from the user using the input function and displaying it using the print function.

In [2]:
name = input("Enter Employee Name: ")
print(name)

Enter Employee Name: Leon
Leon


Python input() function syntax: `input([prompt])` 

Here the prompt argument is optional if it is present, it displays to standard output without a trailing newline. i.e., its a message to the user. For example, the prompt is “Please enter a value.”
- When input() function executes program flow stops until a user enters some value.
- The text or message display on the output screen to ask a user to enter input value is optional, i.e. the prompt parameter is optional.

Whatever you enter as input, input function convert it into a string. If you enter an integer value still input() function convert it into a string. You need to do the explicit conversion into an integer in your code.

In [3]:
number = input ("Enter number")
print ("type of number", type(number))

Enter number45
type of number <class 'str'>


### Accept an Integer input from User

As you know whatever you enter as input, the input() function always converts it into a string. If you enter an integer value, still input() function converts it into a string. So let’s see how to accept an integer value from a user in python.

We need to convert an input value into an integer type explicitly. Let’s see the example to take integer input in python.

In [4]:
# program to do aAddition of two input numbers

first_number = int ( input ("Enter first number") )
second_number = int ( input ("Enter second number") )

sum = first_number + second_number

print("Addition of two number is: ", sum)

Enter first number14
Enter second number45
Addition of two number is:  59


In [10]:
# Python Program to check user input is a Positive Number or Negative
user_number = input ("Enter your number")
try:
    val = int(user_number)
    if(val > 0):
        print("User number is positive ")
    else:
        print("User number is negative ")
except ValueError:
    print("No.. input string is not a number. It's a string")

Enter your number-9
User number is negative 


### Accept float input from User

Let’s see how to accept float value from a user in python. You need to convert user input to the float number as we did for the integer value. i.e., we explicitly added a cast of a float type to an input function, i.e., we converted an input value to the float type.

In [11]:
float_number = float (input("Enter a float number") )
print ("input float number is: ", float_number )
print ("type is:", type(float_number) )

Enter a float number8.45
input float number is:  8.45
type is: <class 'float'>


### Get multiple values from the user in one line

In Python, It is possible to get multiple values from the user in one line. i.e., In Python, we can accept two or three values from the user in one input() call.

For example, in a single call of the input() function, we can ask the user hi/her name, age, and phone number and store it in three different variables. Let’ see how to do this.

In [12]:
name, age, phone = input("Enter your name, Age, Percentage separated by space: ").split()
print("User Details: ", name, age, phone)

Enter your name, Age, Percentage separated by space: Leon, 25, 041596855
User Details:  Leon, 25, 041596855


### Vaja: Get a list of numbers as input from the user

I have created two examples in python to accept a list of numbers from the user and calculate the sum of the list of numbers.

In [13]:
# možnost 1
input_string = input("Enter a list numbers or elements separated by space: ")
userList = input_string.split()
print("user list is ", userList)

print("Calculating sum of element of input list")
sum = 0
for num in userList:
    sum += int(num)
print("Sum = ", sum)

Enter a list numbers or elements separated by space: 1 2 5 89 65 8 5 4 5 2  
user list is  ['1', '2', '5', '89', '65', '8', '5', '4', '5', '2']
Calculating sum of element of input list
Sum =  186


In [14]:
# možnost 2
numberList = []
n = int(input("Enter the list size : "))
for i in range(0, n):
    print("Enter number at location", i, ":")
    item = int(input())
    numberList.append(item)
print("User List is ", numberList)

Enter the list size : 5
Enter number at location 0 :
5
Enter number at location 1 :
4
Enter number at location 2 :
8
Enter number at location 3 :
8
Enter number at location 4 :
5
User List is  [5, 4, 8, 8, 5]


In [16]:
# možnost 3
n = int(input("Enter the size of list : "))
numList = list(int(num) for num in input("Enter the list numbers separated by space: ").strip().split())[:n]
print("New List: ", numList)

Enter the size of list : 7
Enter the list numbers separated by space: 4 5 48 9 6 5 8 
New List:  [4, 5, 48, 9, 6, 5, 8]


## Managing Files and Directories

Python has several built-in modules and functions for handling files. These functions are spread out over several modules such as os, os.path, shutil, and pathlib, to name a few. This article gathers in one place many of the functions you need to know in order to perform the most common operations on files in Python.

[File and Directory Access](https://docs.python.org/3/library/filesys.html)

os module provides a portable way of using operating system dependent functionality. If you just want to read or write a file see open(), if you want to manipulate paths, see the os.path module, and if you want to read all the lines in all the files on the command line see the fileinput module. For creating temporary files and directories see the tempfile module, and for high-level file and directory handling see the shutil module.

Avtomatizacija taskov, kdaj se splača: [slika](https://xkcd.com/1205/)

### General OS operations

In [15]:
import os

- `os.getcwd()`: Return a string representing the current working directory.

In [24]:
os.getcwd()

'/home/jovyan/work/osnovni_tecaj/10_Interact_with_the_Operating_System'

- `os.chdir(path)` Change the current working directory to path.

In [None]:
os.chdir('./data')

- `os.path.exists(path)`: Return True if path refers to an existing path or an open file descriptor. Returns False for broken symbolic links. On some platforms, this function may return False if permission is not granted to execute os.stat() on the requested file, even if the path physically exists.

In [17]:
os.path.exists('data/')

True

- `os.path.abspath(path)`: Return a normalized absolutized version of the pathname path. On most platforms, this is equivalent to calling the function normpath() as follows: normpath(join(os.getcwd(), path)).

In [18]:
os.path.abspath("data/example.txt")

'/home/jovyan/work/osnovni_tecaj/10_Interact_with_the_Operating_System/data/example.txt'

- `os.path.getsize(path)`: Return the size, in bytes, of path. Raise OSError if the file does not exist or is inaccessible.

In [23]:
os.path.getsize('data/weblog.csv')

1115343

- `os.path.isdir(path)`: Return True if path is an existing directory. This follows symbolic links, so both islink() and isdir() can be true for the same path.

In [27]:
os.path.isdir('data')

True

- `os.path.isfile(path)`: Return True if path is an existing directory. This follows symbolic links, so both islink() and isdir() can be true for the same path.

In [28]:
os.path.isfile('data/example3.txt')

True

- `os.path.join(path, *paths)`: Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

In [31]:
os.path.join('/data', 'examples/test.py')

'/data/examples/test.py'

- `os.path.getatime(path)`: Return the time of last access of path. The return value is a floating point number giving the number of seconds since the epoch (see the time module). Raise OSError if the file does not exist or is inaccessible.
- `os.path.getmtime(path)`: Return the time of last modification of path. The return value is a floating point number giving the number of seconds since the epoch (see the time module). Raise OSError if the file does not exist or is inaccessible.
- `os.path.getctime(path)`: Return the system’s ctime which, on some systems (like Unix) is the time of the last metadata change, and, on others (like Windows), is the creation time for path. The return value is a number giving the number of seconds since the epoch (see the time module). Raise OSError if the file does not exist or is inaccessible.

In [32]:
timestamp = os.path.getmtime('data/example.txt')
print(timestamp)

1580161039.466197


In [33]:
# pretvorba v datetime
from datetime import datetime

def unix_to_str_time(unix_time, time_format='%Y-%m-%d %H:%M:%S'):
    return datetime.utcfromtimestamp(unix_time).strftime(time_format)

In [34]:
unix_to_str_time(timestamp)

'2020-01-27 21:37:19'

### Making Directories

Sooner or later, the programs you write will have to create directories in order to store data in them. os and pathlib include functions for creating directories. We’ll consider these:

#### Creating a Single Directory

To create a single directory, pass a path to the directory as a parameter to os.mkdir():

In [1]:
import os

os.mkdir('data/example_directory/')

If a directory already exists, os.mkdir() raises FileExistsError

In [2]:
os.mkdir('data/example_directory/')

FileExistsError: [Errno 17] File exists: 'data/example_directory/'

To avoid errors like this, catch the error when it happens and let your user know:

In [3]:
try:
    os.mkdir('data/example_directory/')
except FileExistsError as exc:
    print(exc)

[Errno 17] File exists: 'data/example_directory/'


#### Creating Multiple Directories

os.makedirs() is similar to os.mkdir(). The difference between the two is that not only can os.makedirs() create individual directories, it can also be used to create directory trees. In other words, it can create any necessary intermediate folders in order to ensure a full path exists.

os.makedirs() is similar to running mkdir -p in Bash. For example, to create a group of directories like 2018/10/05, all you have to do is the following:



> Recursive directory creation function. Like mkdir(), but makes all intermediate-level directories needed to contain the leaf directory.

In [5]:
os.makedirs('data/leto/mesec/dan')

.makedirs() creates directories with default permissions. If you need to create directories with different permissions call .makedirs() and pass in the mode you would like the directories to be created in:

In [6]:
os.makedirs('data/leto/2018/10/05', mode=0o770)

This creates the 2018/10/05 directory structure and gives the owner and group users read, write, and execute permissions. The default mode is 0o777, and the file permission bits of existing parent directories are not changed. For more details on file permissions, and how the mode is applied, see the docs.

[Premissions](https://danielmiessler.com/images/permissions.png)

### Getting a Directory Listing

The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os.listdir() in legacy versions of Python or os.scandir() in Python 3.x. os.scandir() is the preferred method to use if you also want to get file and directory properties such as file size and modification date.

- [os.listdir](https://docs.python.org/3/library/os.html#os.listdir)
- [os.scandir](https://docs.python.org/3/library/os.html#os.scandir)
    

> The scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.

In [17]:
# Directory Listing in Legacy Python Versions
import os
entries = os.listdir('./')
print(entries)

['Command_line_with_python.ipynb', 'skripte', '.ipynb_checkpoints', 'data']


In [18]:
# Primer: kako vidimo katero so mape katero datoteke
directory = './data'

for name in os.listdir(directory):
    fullname = os.path.join(directory, name) #joinmao ne glede na sistem
    if os.path.isdir(fullname):
        print(f'{fullname} is a directory')
    else:
        print(f'{fullname} is a file')

./data/newdir is a directory
./data/example.txt is a file
./data/.ipynb_checkpoints is a directory


In modern versions of Python, an alternative to os.listdir() is to use os.scandir() and pathlib.Path().

os.scandir() was introduced in Python 3.5 and is documented in PEP 471. os.scandir() returns an iterator as opposed to a list when called:

In [19]:
# Directory Listing in Modern Python Versions
import os
entries = os.scandir('./')
print(entries)

<posix.ScandirIterator object at 0x7fb9039daa58>


The ScandirIterator points to all the entries in the current directory. You can loop over the contents of the iterator and print out the filenames:

In [22]:
import os

with os.scandir('./') as entries:
    for entry in entries:
        print(entry.name)

Command_line_with_python.ipynb
skripte
.ipynb_checkpoints
data


Here, os.scandir() is used in conjunction with the with statement because it supports the context manager protocol. Using a context manager closes the iterator and frees up acquired resources automatically after the iterator has been exhausted.

The following example shows a simple use of scandir() to display all the files (excluding directories) in the given path that don’t start with '.'. The entry.is_file() call will generally not make an additional system call:

In [24]:
# Primer
with os.scandir('./') as it:
    for entry in it:
        if not entry.name.startswith('.') and entry.is_file():
            print(entry.name)

Command_line_with_python.ipynb


 Calling entry.is_file() on each item in the ScandirIterator returns True if the object is a file. 

Using pathlib.Path() or os.scandir() instead of os.listdir() is the preferred way of getting a directory listing, especially when you’re working with code that needs the file type and file attribute information. pathlib.Path() offers much of the file and path handling functionality found in os and shutil, and it’s methods are more efficient than some found in these modules. We will discuss how to get file properties shortly.

In [25]:
# Listing Subdirectories
# List all subdirectories using scandir()
basepath = './'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_dir():
            print(entry.name)

skripte
.ipynb_checkpoints
data


As in the file listing example, here you call .is_dir() on each entry returned by os.scandir(). If the entry is a directory, .is_dir() returns True, and the directory’s name is printed out. The output is the same as above:



### Getting File Attributes

Python makes retrieving file attributes such as file size and modified times easy. This is done through os.stat(), os.scandir(), or pathlib.Path().

os.scandir() and pathlib.Path() retrieve a directory listing with file attributes combined. This can be potentially more efficient than using os.listdir() to list files and then getting file attribute information for each file.

The examples below show how to get the time the files in my_directory/ were last modified. The output is in seconds:

In [41]:
import os
with os.scandir('./') as dir_contents:
    for entry in dir_contents:
        info = entry.stat()
        #print(info)
        print(f'Name: {entry.name}, Time: {info.st_mtime}s')

Name: Command_line_with_python.ipynb, Time: 1580251977.5320895s
Name: skripte, Time: 1580242340.9039881s
Name: .ipynb_checkpoints, Time: 1579434568.7935946s
Name: data, Time: 1580162510.3971303s


[os.stat_result](https://docs.python.org/3/library/os.html#os.stat_result)

os.scandir() returns a ScandirIterator object. Each entry in a ScandirIterator object has a .stat() method that retrieves information about the file or directory it points to. .stat() provides information such as file size and the time of last modification. In the example above, the code prints out the st_mtime attribute, which is the time the content of the file was last modified.

The st_mtime attribute returns a float value that represents seconds since the epoch. To convert the values returned by st_mtime for display purposes, you could write a helper function to convert the seconds into a datetime object:



In [39]:
from datetime import datetime
from os import scandir

def convert_date(timestamp):
    d = datetime.utcfromtimestamp(timestamp)
    formated_date = d.strftime('%Y-%m-%d %H:%M:%S')
    return formated_date

def get_files(dir_path):
    dir_entries = scandir(dir_path)
    for entry in dir_entries:
        if entry.is_file():
            info = entry.stat()
            # The width specifier sets the width of the value. 
            print(f'{entry.name:20}\t Last Modified: {convert_date(info.st_mtime)}')
            
get_files('./skripte')

example04.py        	 Last Modified: 2020-01-19 16:26:12
example00.py        	 Last Modified: 2020-01-19 14:46:31
example03.py        	 Last Modified: 2020-01-19 14:50:58
create_file.py      	 Last Modified: 2020-01-28 19:48:09
host.py             	 Last Modified: 2020-01-28 21:20:55
healthcheck_script.py	 Last Modified: 2020-01-27 20:41:51
example01.py        	 Last Modified: 2020-01-19 14:43:19
example05.py        	 Last Modified: 2020-01-25 23:07:59
example02.py        	 Last Modified: 2020-01-19 14:43:03


This will first get a list of files in my_directory and their attributes and then call convert_date() to convert each file’s last modified time into a human readable form. convert_date() makes use of .strftime() to convert the time in seconds into a string.

### Filename Pattern Matching

After getting a list of files in a directory using one of the methods above, you will most probably want to search for files that match a particular pattern.

These are the methods and functions available to you:
- endswith() and startswith() string methods
- fnmatch.fnmatch()
- glob.glob()
- pathlib.Path.glob()

Each of these is discussed below. The examples in this section will be performed on a directory called some_directory that has the following structure:

Pripravimo datoteke

    mkdir some_directory
    cd some_directory/
    mkdir sub_dir
    touch sub_dir/file1.py sub_dir/file2.py
    touch data_{01..03}.txt data_{01..03}_backup.txt admin.py tests.py

#### Using String Methods

Python has several built-in methods for modifying and manipulating strings. Two of these methods, .startswith() and .endswith(), are useful when you’re searching for patterns in filenames. To do this, first get a directory listing and then iterate over it:

In [18]:
import os

# Get .txt files
def search_file_by_extension(directory, extension):
    for f_name in os.listdir(directory):
        if f_name.endswith(extension):
            print(f_name)

In [19]:
search_file_by_extension('data/some_directory/', '.txt')

data_01_backup.txt
data_02_backup.txt
data_01.txt
data_03_backup.txt
data_03.txt
data_02.txt


#### Filename Pattern Matching Using fnmatch

> [fnmatch — Unix filename pattern matching](https://docs.python.org/3.8/library/fnmatch.html)

String methods are limited in their matching abilities. fnmatch has more advanced functions and methods for pattern matching. We will consider fnmatch.fnmatch(), a function that supports the use of wildcards such as * and ? to match filenames. For example, in order to find all .txt files in a directory using fnmatch, you would do the following:

In [20]:
import os
import fnmatch

for file_name in os.listdir('data/some_directory/'):
    if fnmatch.fnmatch(file_name, '*.txt'):
        print(file_name)

data_01_backup.txt
data_02_backup.txt
data_01.txt
data_03_backup.txt
data_03.txt
data_02.txt


This iterates over the list of files in some_directory and uses .fnmatch() to perform a wildcard search for files that have the .txt extension.

Let’s suppose you want to find .txt files that meet certain criteria. For example, you could be only interested in finding .txt files that contain the word data, a number between a set of underscores, and the word backup in their filename. Something similar to data_01_backup, data_02_backup, or data_03_backup.

Using fnmatch.fnmatch(), you could do it this way:

In [22]:
for filename in os.listdir('data/some_directory/'):
    if fnmatch.fnmatch(filename, 'data_*_backup.txt'):
        print(filename)

data_01_backup.txt
data_02_backup.txt
data_03_backup.txt


Here, you print only the names of files that match the data_*_backup.txt pattern. The asterisk in the pattern will match any character, so running this will find all text files whose filenames start with the word data and end in backup.txt.

#### Filename Pattern Matching Using glob

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched. This is done by using the os.scandir() and fnmatch.fnmatch() functions in concert, and not by actually invoking a subshell. Note that unlike fnmatch.fnmatch(), glob treats filenames beginning with a dot (.) as special cases.

> [glob — Unix style pathname pattern expansion](https://docs.python.org/3.8/library/glob.html?highlight=glob#module-glob)

Another useful module for pattern matching is glob.

.glob() in the glob module works just like fnmatch.fnmatch(), but unlike fnmatch.fnmatch(), it treats files beginning with a period (.) as special.

UNIX and related systems translate name patterns with wildcards like ? and * into a list of files. This is called globbing.

For example, typing mv *.py python_files/ in a UNIX shell moves (mv) all files with the .py extension from the current directory to the directory python_files. The * character is a wildcard that means “any number of characters,” and *.py is the glob pattern. This shell capability is not available in the Windows Operating System. The glob module adds this capability in Python, which enables Windows programs to use this feature.

Here’s an example of how to use glob to search for all Python (.py) source files in the current directory:

In [24]:
import glob
glob.glob('data/some_directory/sub_dir/*.py')

['data/some_directory/sub_dir/file1.py',
 'data/some_directory/sub_dir/file2.py']

glob.glob('*.py') searches for all files that have the .py extension in the current directory and returns them as a list. glob also supports shell-style wildcards to match patterns:

In [26]:
import glob
for name in glob.glob('data/some_directory/*[0-9]*.txt'):
    print(name)

data/some_directory/data_01_backup.txt
data/some_directory/data_02_backup.txt
data/some_directory/data_01.txt
data/some_directory/data_03_backup.txt
data/some_directory/data_03.txt
data/some_directory/data_02.txt


glob makes it easy to search for files recursively in subdirectories too:

In [29]:
import glob
for file in glob.iglob('data/some_directory/**/*.py', recursive=True):
    print(file)

data/some_directory/tests.py
data/some_directory/admin.py
data/some_directory/sub_dir/file1.py
data/some_directory/sub_dir/file2.py


This example makes use of glob.iglob() to search for .py files in the current directory and subdirectories. Passing recursive=True as an argument to .iglob() makes it search for .py files in the current directory and any subdirectories. The difference between glob.iglob() and glob.glob() is that .iglob() returns an iterator instead of a list.

### Traversing Directories and Processing Files

A common programming task is walking a directory tree and processing files in the tree. Let’s explore how the built-in Python function os.walk() can be used to do this. os.walk() is used to generate filename in a directory tree by walking the tree either top-down or bottom-up. For the purposes of this section, we’ll be manipulating the following directory tree:

The following is an example that shows you how to list all files and directories in a directory tree using os.walk().

os.walk() defaults to traversing directories in a top-down manner:

In [30]:
# Walking a directory tree and printing the names of the directories and files
for dirpath, dirnames, files in os.walk('./data/some_directory/'):
    print(f'Found directory: {dirpath}')
    for file_name in files:
        print(file_name)

Found directory: ./data/some_directory/
tests.py
data_01_backup.txt
data_02_backup.txt
admin.py
data_01.txt
data_03_backup.txt
data_03.txt
data_02.txt
Found directory: ./data/some_directory/sub_dir
file1.py
file2.py


os.walk() returns three values on each iteration of the loop:
- The name of the current folder
- A list of folders in the current folder
- A list of files in the current folder

On each iteration, it prints out the names of the subdirectories and files it finds:

To traverse the directory tree in a bottom-up manner, pass in a topdown=False keyword argument to os.walk():

Passing the topdown=False argument will make os.walk() print out the files it finds in the subdirectories first:

As you can see, the program started by listing the contents of the subdirectories before listing the contents of the root directory. This is very useful in situations where you want to recursively delete files and directories. You will learn how to do this in the sections below. By default, os.walk does not walk down into symbolic links that resolve to directories. This behavior can be overridden by calling it with a followlinks=True argument.

In [31]:
# vaja dodamo v vsak file neko vsebino
for dirpath, dirnames, files in os.walk('./data/some_directory/'):
    print(f'Found directory: {dirpath}')
    for file_name in files:
        fullname = os.path.join(dirpath, file_name)
        print(f'Editing {fullname}...')
        with open(fullname, 'w') as f:
            f.write('Heloo')

Found directory: ./data/some_directory/
Editing ./data/some_directory/tests.py...
Editing ./data/some_directory/data_01_backup.txt...
Editing ./data/some_directory/data_02_backup.txt...
Editing ./data/some_directory/admin.py...
Editing ./data/some_directory/data_01.txt...
Editing ./data/some_directory/data_03_backup.txt...
Editing ./data/some_directory/data_03.txt...
Editing ./data/some_directory/data_02.txt...
Found directory: ./data/some_directory/sub_dir
Editing ./data/some_directory/sub_dir/file1.py...
Editing ./data/some_directory/sub_dir/file2.py...


### Deleting Files and Directories

You can delete single files, directories, and entire directory trees using the methods found in the os, shutil, and pathlib modules. The following sections describe how to delete files and directories that you no longer need.

#### Deleting Files in Python

To delete a single file, use pathlib.Path.unlink(), os.remove(). or os.unlink().

os.remove() and os.unlink() are semantically identical. To delete a file using os.remove(), do the following:

In [1]:
! touch ./data/delete.txt

In [2]:
import os

data_file = './data/delete.txt'
os.remove(data_file)

Deleting a file using os.unlink() is similar to how you do it using os.remove():

In [None]:
import os

data_file = './data/delete.txt'
os.unlink(data_file)

Calling .unlink() or .remove() on a file deletes the file from the filesystem. These two functions will throw an OSError if the path passed to them points to a directory instead of a file. To avoid this, you can either check that what you’re trying to delete is actually a file and only delete it if it is, or you can use exception handling to handle the OSError:

In [3]:
import os

data_file = './data/delete.txt'

# If the file exists, delete it
if os.path.isfile(data_file):
    os.remove(data_file)
else:
    print(f'Error: {data_file} not a valid filename')

Error: ./data/delete.txt not a valid filename


os.path.isfile() checks whether data_file is actually a file. If it is, it is deleted by the call to os.remove(). If data_file points to a folder, an error message is printed to the console.

The following example shows how to use exception handling to handle errors when deleting files:

In [4]:
import os

data_file = './data/delete.txt'

# Use exception handling
try:
    os.remove(data_file)
except OSError as e:
    print(f'Error: {data_file} : {e.strerror}')

Error: ./data/delete.txt : No such file or directory


The code above attempts to delete the file first before checking its type. If data_file isn’t actually a file, the OSError that is thrown is handled in the except clause, and an error message is printed to the console. The error message that gets printed out is formatted using Python f-strings.

#### Deleting Directories

The standard library offers the following functions for deleting directories:
- os.rmdir()
- pathlib.Path.rmdir()
- shutil.rmtree()

To delete a single directory or folder, use os.rmdir() or pathlib.rmdir(). These two functions only work if the directory you’re trying to delete is empty. If the directory isn’t empty, an OSError is raised. Here is how to delete a folder:

In [5]:
!mkdir ./data/test

In [6]:
import os

trash_dir = './data/test'

try:
    os.rmdir(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')

Here, the trash_dir directory is deleted by passing its path to os.rmdir(). If the directory isn’t empty, an error message is printed to the screen:

    Traceback (most recent call last):
      File '<stdin>', line 1, in <module>
    OSError: [Errno 39] Directory not empty: 'my_documents/bad_dir'

#### Deleting Entire Directory Trees

To delete non-empty directories and entire directory trees, Python offers shutil.rmtree():

In [7]:
!mkdir ./data/test
!touch ./data/test/lalal

In [8]:
import shutil

trash_dir = './data/test'

try:
    shutil.rmtree(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')

Everything in trash_dir is deleted when shutil.rmtree() is called on it. There may be cases where you want to delete empty folders recursively. You can do this using one of the methods discussed above in conjunction with os.walk():

In [None]:
import os

for dirpath, dirnames, files in os.walk('.', topdown=False):
    try:
        os.rmdir(dirpath)
    except OSError as ex:
        pass

This walks down the directory tree and tries to delete each directory it finds. If the directory isn’t empty, an OSError is raised and that directory is skipped. The table below lists the functions covered in this section:

<div class="table-responsive">
<table class="table table-hover">
<thead>
<tr>
<th>Function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>os.remove()</code></td>
<td>Deletes a file and does not delete directories</td>
</tr>
<tr>
<td><code>os.unlink()</code></td>
<td>Is identical to <code>os.remove()</code> and deletes a single file</td>
</tr>
<tr>
<td><code>pathlib.Path.unlink()</code></td>
<td>Deletes a file and cannot delete directories</td>
</tr>
<tr>
<td><code>os.rmdir()</code></td>
<td>Deletes an empty directory</td>
</tr>
<tr>
<td><code>pathlib.Path.rmdir()</code></td>
<td>Deletes an empty directory</td>
</tr>
<tr>
<td><code>shutil.rmtree()</code></td>
<td>Deletes entire directory tree and can be used to delete non-empty directories</td>
</tr>
</tbody>
</table>
</div>

### Copying, Moving, and Renaming Files and Directories

Python ships with the shutil module. shutil is short for shell utilities. It provides a number of high-level operations on files to support copying, archiving, and removal of files and directories. In this section, you’ll learn how to move and copy files and directories.

#### Copying Files in Python

shutil offers a couple of functions for copying files. The most commonly used functions are shutil.copy() and shutil.copy2(). To copy a file from one location to another using shutil.copy(), do the following:

In [9]:
import shutil

src = './data/example.txt'
dst = './data/example2.txt'
shutil.copy(src, dst)

'./data/example2.txt'

shutil.copy() is comparable to the cp command in UNIX based systems. shutil.copy(src, dst) will copy the file src to the location specified in dst. If dst is a file, the contents of that file are replaced with the contents of src. If dst is a directory, then src will be copied into that directory. shutil.copy() only copies the file’s contents and the file’s permissions. Other metadata like the file’s creation and modification times are not preserved.

To preserve all file metadata when copying, use shutil.copy2():

In [11]:
import shutil

src = './data/example.txt'
dst = './data/example3.txt'
shutil.copy2(src, dst)

'./data/example3.txt'

Using .copy2() preserves details about the file such as last access time, permission bits, last modification time, and flags.

> Warning: Even the higher-level file copying functions (shutil.copy(), shutil.copy2()) cannot copy all file metadata.
On POSIX platforms, this means that file owner and group are lost as well as ACLs. On Mac OS, the resource fork and other metadata are not used. This means that resources will be lost and file type and creator codes will not be correct. On Windows, file owners, ACLs and alternate data streams are not copied.

#### Copying Directories

While shutil.copy() only copies a single file, shutil.copytree() will copy an entire directory and everything contained in it. shutil.copytree(src, dest) takes two arguments: a source directory and the destination directory where files and folders will be copied to.

Here’s an example of how to copy the contents of one folder to a different location:

In [None]:
import shutil
shutil.copytree('data_1', 'data1_backup')

In this example, .copytree() copies the contents of data_1 to a new location data1_backup and returns the destination directory. The destination directory must not already exist. It will be created as well as missing parent directories. shutil.copytree() is a good way to back up your files.

#### Moving Files and Directories

To move a file or directory to another location, use shutil.move(src, dst).

src is the file or directory to be moved and dst is the destination:

In [None]:
import shutil
shutil.move('dir_1/', 'backup/')

shutil.move('dir_1/', 'backup/') moves dir_1/ into backup/ if backup/ exists. If backup/ does not exist, dir_1/ will be renamed to backup

#### Renaming Files and Directories

Python includes os.rename(src, dst) for renaming files and directories:

In [None]:
os.rename('first.zip', 'first_01.zip')

The line above will rename first.zip to first_01.zip. If the destination path points to a directory, it will raise an OSError.

### Archiving

Archives are a convenient way to package several files into one. The two most common archive types are ZIP and TAR. The Python programs you write can create, read, and extract data from archives. You will learn how to read and write to both archive formats in this section.

Več [tukaj](https://realpython.com/working-with-files-in-python/#archiving).

#### ZIP Files

The zipfile module is a low level module that is part of the Python Standard Library. zipfile has functions that make it easy to open and extract ZIP files.

[zipfile — Work with ZIP archives](https://docs.python.org/3/library/zipfile.html?highlight=zip#module-zipfile)


The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.

This module does not currently handle multi-disk ZIP files. It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size). It supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file. Decryption is extremely slow as it is implemented in native Python rather than C.

#### TAR Archives

TAR files are uncompressed file archives like ZIP. They can be compressed using gzip, bzip2, and lzma compression methods. The TarFile class allows reading and writing of TAR archives.

[tarfile — Read and write tar archive files](https://docs.python.org/3/library/tarfile.html?highlight=tar#module-tarfile)

The tarfile module makes it possible to read and write tar archives, including those using gzip, bz2 and lzma compression. Use the zipfile module to read or write .zip files, or the higher-level functions in shutil.

#### An Easier Way of Creating Archives

The Python Standard Library also supports creating TAR and ZIP archives using the high-level methods in the shutil module. The archiving utilities in shutil allow you to create, read, and extract ZIP and TAR archives. These utilities rely on the lower level tarfile and zipfile modules.

shutil.make_archive() takes at least two arguments: the name of the archive and an archive format.

By default, it compresses all the files in the current directory into the archive format specified in the format argument. You can pass in an optional root_dir argument to compress files in a different directory. .make_archive() supports the zip, tar, bztar, and gztar archive formats.

This is how to create a TAR archive using shutil:

In [13]:
import shutil

# shutil.make_archive(base_name, format, root_dir)
shutil.make_archive('./data/leto', 'tar', './data/leto')

'/home/jovyan/work/osnovni_tecaj/10_Interact_with_the_Operating_System/data/leto.tar'

- base_name is the name of the file to create, including the path, minus any format-specific extension. format is the archive format: one of “zip” (if the zlib module is available), “tar”, “gztar” (if the zlib module is available), “bztar” (if the bz2 module is available), or “xztar” (if the lzma module is available).
- root_dir is a directory that will be the root directory of the archive; for example, we typically chdir into root_dir before creating the archive.

This copies everything in data/ and creates an archive called backup.tar in the filesystem and returns its name. To extract the archive, call .unpack_archive():

In [14]:
shutil.unpack_archive('./data/leto.tar', 'data/extract_dir/')

Calling .unpack_archive() and passing in an archive name and destination directory extracts the contents of backup.tar into extract_dir/. ZIP archives can be created and extracted in the same way.

## System monitoring

psutil (process and system utilities) is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python. It is useful mainly for system monitoring, profiling and limiting process resources and management of running processes. It implements many functionalities offered by classic UNIX command line tools such as ps, top, iotop, lsof, netstat, ifconfig, free and others. 

[psutil documentation](https://psutil.readthedocs.io/en/latest/#)

    pip install psutil

In [72]:
import psutil

In [71]:
psutil.cpu_percent()

0.5

> Return a float representing the current system-wide CPU utilization as a percentage. When interval is > 0.0 compares system CPU times elapsed before and after the interval (blocking). When interval is 0.0 or None compares system CPU times elapsed since last call or module import, returning immediately. That means the first time this is called it will return a meaningless 0.0 value which you are supposed to ignore. In this case it is recommended for accuracy that this function be called with at least 0.1 seconds between calls. When percpu is True returns a list of floats representing the utilization as a percentage for each CPU. First element of the list refers to first CPU, second element to second CPU and so on. The order of the list is consistent across calls.

In [73]:
import shutil
du = shutil.disk_usage("/")

In [74]:
du

usage(total=42004086784, used=15727157248, free=24112828416)

In [75]:
du.free /du.total *100

57.4059103819987

Healthcheck script:

In [76]:
#!/usr/bin/env python3
import shutil
import psutil

def check_disk_usage(disk):
    du = shutil.disk_usage(disk)
    free = du.free / du.total * 100
    return free > 20

def check_cpu_usage():
    usage = psutil.cpu_percent(1)
    return usage < 75

if not check_disk_usage('/') or not check_cpu_usage():
    print('ERROR!')
else:
    print('OK!')

OK!


## Managing Processes

### Environment Variables

os.environ behaves like a python dictionary, so all the common dictionary operations like get and set can be performed. We can also modify os.environ but any changes will be effective only for the current process where it was assigned and it will not change the value permanently.

os.environ in Python is a mapping object that represents the user’s environmental variables. It returns a dictionary having user’s environmental variable as key and their values as value.

#### Use of os.environ to get access of environment variables

In [35]:
os.environ

environ{'LC_ALL': 'en_US.UTF-8',
        'LANG': 'en_US.UTF-8',
        'HOSTNAME': '51ffdbd7dfe3',
        'NB_UID': '1000',
        'CONDA_DIR': '/opt/conda',
        'CONDA_VERSION': '4.7.12',
        'PWD': '/home/jovyan',
        'HOME': '/home/jovyan',
        'MINICONDA_MD5': '1c945f2b3335c7b2b15130b1b2dc5cf4',
        'DEBIAN_FRONTEND': 'noninteractive',
        'NB_USER': 'jovyan',
        'SHELL': '/bin/bash',
        'SHLVL': '0',
        'LANGUAGE': 'en_US.UTF-8',
        'XDG_CACHE_HOME': '/home/jovyan/.cache/',
        'NB_GID': '100',
        'PATH': '/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin',
        'MINICONDA_VERSION': '4.7.10',
        'KERNEL_LAUNCH_TIMEOUT': '40',
        'JPY_PARENT_PID': '6',
        'TERM': 'xterm-color',
        'CLICOLOR': '1',
        'PAGER': 'cat',
        'GIT_PAGER': 'cat',
        'MPLBACKEND': 'module://ipykernel.pylab.backend_inline'}

#### Accessing a particular environment variable

In [36]:
# Get the value of 
# 'HOME' environment variable 
home = os.environ['HOME'] 

In [37]:
home

'/home/jovyan'

In [39]:
# Get the value of 
# 'HOME' environment variable 
# using get operation of dictionary 
os.environ.get('HOME') 

'/home/jovyan'

In [42]:
#  Handling error while Accessing a environment variable which does not exists
os.environ.get('HOMEE', '/home') 

'/home'

#### Modifying a environment variable

In [60]:
os.environ['TEST'] = str('lala')

In [61]:
os.environ.get('TEST', 'Not Set')

'lala'

In [62]:
!echo $TEST

lala


## Python Subprocesses (Executing External Commands)

[subprocess — Subprocess management](https://docs.python.org/3/library/subprocess.html)

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.    

The subprocess module supports three APIs for working with processes. The run() function, added in Python 3.5, is a high-level API for running a process and optionally collecting its output. The functions call(), check_call(), and check_output() are the former high-level API, carried over from Python 2. They are still supported and widely used in existing programs. The class Popen is a low-level API used to build the other APIs and useful for more complex process interactions. The constructor for Popen takes arguments to set up the new process so the parent can communicate with it via pipes. It provides all of the functionality of the other modules and functions it replaces, and more. The API is consistent for all uses, and many of the extra steps of overhead needed (such as closing extra file descriptors and ensuring the pipes are closed) are “built in” instead of being handled by the application code separately.

The subprocess module is intended to replace functions such as os.system(), os.spawnv(), the variations of popen() in the os and popen2 modules, as well as the commands() module. To make it easier to compare subprocess with those other modules, many of the examples in this section re-create the ones used for os and popen2.

### Running External Command

To run an external command without interacting with it in the same way as os.system(), use the run() function.

In [78]:
#subprocess_os_system.py
import subprocess
completed = subprocess.run(['ls', '-l'])
print('returncode:', completed.returncode)

returncode: 0


The command line arguments are passed as a list of strings, which avoids the need for escaping quotes or other special characters that might be interpreted by the shell. run() returns a CompletedProcess instance, with information about the process like the exit code and output.

Setting the shell argument to a true value causes subprocess to spawn an intermediate shell process which then runs the command. The default is to run the command directly.

Normally, commands are executed without the assistance of an underlying shell (e.g., sh, bash, etc.). Instead, the list of strings supplied are given to a low-level system command, such as os.execve(). If you want the command to be interpreted by a shell, supply it using a simple string and give the shell=True argument. This is sometimes useful if you’re trying to get Python to execute a complicated shell command involving pipes, I/O redirection, and other features.

> If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.

In [79]:
#subprocess_shell_variables.py
import subprocess
completed = subprocess.run('echo $HOME', shell=True)
print('returncode:', completed.returncode)

returncode: 0


Using an intermediate shell means that variables, glob patterns, and other special shell features in the command string are processed before the command is run.

Be aware that executing commands under the shell is a potential security risk if arguments
are derived from user input. The shlex.quote() function can be used to properly
quote arguments for inclusion in shell commands in this case.

> Using run() without passing check=True is equivalent to using call(), which only returned the exit code from the process.

### Error Handling

The returncode attribute of the CompletedProcess is the exit code of the program. The caller is responsible for interpreting it to detect errors. If the check argument to run() is True, the exit code is checked and if it indicates an error happened then a CalledProcessError exception is raised.

In [80]:
#subprocess_run_check.py
import subprocess

try:
    subprocess.run(['false'], check=True)
except subprocess.CalledProcessError as err:
    print('ERROR:', err)

ERROR: Command '['false']' returned non-zero exit status 1.


The false command always exits with a non-zero status code, which run() interprets as an error.

> Passing check=True to run() makes it equivalent to using check_call().

### Capturing Output

You want to execute an external command and collect its output as a Python string.

If capture_output is true, stdout and stderr will be captured. When used, the internal Popen object is automatically created with stdout=PIPE and stderr=PIPE. The stdout and stderr arguments may not be supplied at the same time as capture_output. If you wish to capture and combine both streams into one, use stdout=PIPE and stderr=STDOUT instead of capture_output.

In [81]:
completed = subprocess.run(['ls', '-l'], capture_output=True)

In [84]:
completed

CompletedProcess(args=['ls', '-l'], returncode=0, stdout=b'total 148\n-rw-r--r-- 1 jovyan users 141049 Feb 11 22:38 Command_line_with_python.ipynb\ndrwxr-xr-x 8 jovyan users   4096 Feb 11 19:54 data\ndrwxrwxr-x 2 jovyan  1000   4096 Feb 11 22:17 skripte\n', stderr=b'')

In [85]:
completed.returncode

0

This runs the specified command and returns its output as a byte string. If you need to
interpret the resulting bytes as text, add a further decoding step. For example:

In [83]:
completed.stdout

b'total 148\n-rw-r--r-- 1 jovyan users 141049 Feb 11 22:38 Command_line_with_python.ipynb\ndrwxr-xr-x 8 jovyan users   4096 Feb 11 19:54 data\ndrwxrwxr-x 2 jovyan  1000   4096 Feb 11 22:17 skripte\n'

In [87]:
out_text = completed.stdout.decode('utf-8')

In [89]:
print(out_text)

total 148
-rw-r--r-- 1 jovyan users 141049 Feb 11 22:38 Command_line_with_python.ipynb
drwxr-xr-x 8 jovyan users   4096 Feb 11 19:54 data
drwxrwxr-x 2 jovyan  1000   4096 Feb 11 22:17 skripte



In [86]:
completed.stderr

b''

If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the io.TextIOWrapper default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.

In [96]:
subprocess.run(['ls', '-l'], capture_output=True, text=True)

CompletedProcess(args=['ls', '-l'], returncode=0, stdout='total 148\n-rw-r--r-- 1 jovyan users 142601 Feb 11 22:54 Command_line_with_python.ipynb\ndrwxr-xr-x 8 jovyan users   4096 Feb 11 19:54 data\ndrwxrwxr-x 2 jovyan  1000   4096 Feb 11 22:17 skripte\n', stderr='')

### Timeouts

The timeout argument is passed to Popen.communicate(). If the timeout expires, the child process will be killed and waited for. The TimeoutExpired exception will be re-raised after the child process has terminated.

If you need to execute a command with a timeout, use the timeout argument:

In [91]:
try:
    out_bytes = subprocess.run(['ls'], timeout=5)
except subprocess.TimeoutExpired as e:
    print('error', e)

In [95]:
try:
    out_bytes = subprocess.run(['sleep', '6'], timeout=2)
except subprocess.TimeoutExpired as e:
    print('error:', e)

error: Command '['sleep', '6']' timed out after 2 seconds


### Suppressing Output

For cases where the output should not be shown or captured, use DEVNULL to suppress an output stream. This example suppresses both the standard output and error streams.

In [None]:
completed = subprocess.run('cat example01.py', 
        shell=True,
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )

The name DEVNULL comes from the Unix special device file, /dev/null, which responds with end-of-file when opened for reading and receives but ignores any amount of input when writing.

## Vaja 1

Zaženi `ls -lah /` in izpiši v list vsa imena map in datotek brez . spredaj.

In [18]:
import subprocess
completed = subprocess.run(['ls', '-lah','/'], capture_output=True)

In [19]:
text = completed.stdout.decode('utf-8')

In [46]:
names = [line.split()[8] for line in text.splitlines()[1:] if not line.startswith('.')]

In [48]:
final_names = [name for name in names if not name.startswith('.')]

In [50]:
print(final_names)

['bin', 'boot', 'dev', 'etc', 'home', 'lib', 'lib64', 'media', 'mnt', 'opt', 'proc', 'root', 'run', 'sbin', 'srv', 'sys', 'tmp', 'usr', 'var']
