# Agenda

1. What is a project (vs. a program)?
2. The parts of a project
3. Virtualenv
4. Distribution packages (vs. regular packages)
5. Poetry
6. Create a package/project
7. Testing
8. Command-line program 
9. Publishing to PyPI with Poetry

# What is a project?

Anything that:
- Has more than one file
    - More than one module
    - Documentation
    - Command-line executable script
- You want to share with other people

So what? You'll need:
- More structure 
- Versioning
- Distribution mechanism


# PyPI is great! `pip` is great!

- If you define code in a single file, and then want to use info in that file, that's a "module."
- If you have more than one file that should be included in the same module, we call that a "package."
    - Python packages are directories that contain modules.
    - Python packages also provide us with deeper namespaces
- If you want to distribute your package to other people, then you need a "distribution package."    
- When you download and install something from PyPI, using `pip`, you're installing a "distribution package."
- When you install a package with `pip`, it typically goes into your `site-packages` directory. It's installed there in a subdirectory with the name of the package you installed. This means that if you are working on two different projects, and each project uses the same package from PyPI, but in different versions... you're in trouble.

# Solution: Virtual environments

Virtual environments (aka "venvs") are a way to provide your particular project with its own, independent `site-packages` directory. If every project you work on uses a different venv, then each project can install different versions of the same package from PyPI, without interfering with one another.

Python pays attention to venvs thanks to some manipulation of the environment variables in your command-line shell.

# Where are we now?

- We understand what a package is (i.e., a directory with `__init__.py` and one or more files + subdirectories
- Distribution packages, which are packages + metadata
- wheelfiles vs. .tar.gz
- pip can install wheelfiles
- installing .tar.gz is not as good, but sometimes necessary (`python install setup.py`)

In [1]:
import mypackage

In [2]:
dir(mypackage)

['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']

In [3]:
type(mypackage)

module

In [4]:
from mypackage import a  # go into the "mypackage" directory, and find "a.py", and load it as a module -- here, a is a global variable

In [8]:
import mypackage.a       # define a as an attribute on mypackage

In [5]:
from mypackage import b

In [6]:
a.hello('world')

'Hello from a, world!'

In [7]:
b.hello('world')

'Hello from b, world!'

In [1]:
import mypackage

Hello from __init__.py!


In [2]:
dir(mypackage)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']

In [1]:
from mypackage import a

Hello from __init__.py!


In [1]:
import mypackage.b

Hello from __init__.py!


# Next up

1. How Poetry solves all of these problems
2. We'll start developing an application using Poetry

In order to do this next part, please install Poetry!  You can know if Poetry is installed with `poetry -v` on the command line.

Don't install it inside of a venv!

In [2]:
# Return at :05

# How do we use Poetry?

It's a command-line tool with a very large number of commands. You typically write

    poetry command args...
    
The first thing you have to do is create a new project with Poetry.  You can do that with

    poetry new PROJNAME
    
That will create a new subdirectory with that project name.  If you already have a project and you want to use Poetry to work with it, then you can do that as well, but we're going to ignore that for now.    

# What will we be doing?

We're going to create a project that allows us to do some basic reading and filtering of Apache logfiles.  The ideas that with our project:

- We'll have a module that defines a class (`ApacheParse`)
- Each instance of that class will allow us to parse and look at an Apache logfile
- We'll have several methods we can call on the object (`ap`) to get back the lines from that logfile in Python objects.
- We'll also have a command-line program we can use to invoke our parser.

# First: Create our project

I'm going to call this the "aparse" project, and I'm going to create it in a project called "rmlaparse".

# Exercise: Create a module

Write a module, `aparser.py`, inside of our package. That module should define a class, `ApacheParse`, which takes a single argument, a string, the name of the file we will want to open.  The file should be opened, and the opened file object should be stored to an attribute, `f`.



# Exercise: Parse with `csv`

Add a method, `parse_file`, to your `ApacheParse` class. That method should take no arguments (other than `self`), and it should then:
- Create a `csv.reader` instance, to read data with a `delimiter` set to a space character
- Iterate over each row in the file
- `yield` the current row, so as to be a generator method

In the end, you should be able to do this:

```python
ap = ApacheParse('access.log.1')
for one_record in ap.parse_file():
    print(one_record)   # this should print a list of strings, one such list for each line in the file
```

# Next up:

1. Add testing with `pytest`
2. Add new functionality, as well
3. Command-line script

Resume at :05

# Exercise: Add some tests

1. Add the tests that I created already (i.e., to create a new instance, check that it's iterable)
2. Add a test to make sure that we're getting a list of strings with each returned element from `parse_file`.

In order to do this, you'll need to install `pytest` with `pip install pytest`.

If you do have it installed, and don't seem to have it working on the command line, you can try `python3 -m pytest`.

In [4]:
# the "all" function returns True if every element in the list is True
all([True, True, True])

True

In [5]:
all([2==2, 4==4, 10==10])

True

In [6]:
list('abcd')

['a', 'b', 'c', 'd']

# Interface vs. implementation

Everything in Python is public. That is, we can know about every attribute on every object in the system. Does this mean, though, that we *should* have access to it, or that we should depend on being able to read from every attribute, and/or write to every attribute?

We can set attributes (both data and methods) to be private in Python not via keywords or rules, but rather by convention: Any name that starts with `_` is considered to be private.  If you use an attribute in another object that starts with `_`, then you are violating that expectation, and you might be unpleasantly surprised in the future.

# Exercise: Return dicts, not lists

1. Add a new method, `record_dicts`, to our `ApacheParse` class.
2. This method should return records based on all of the lines in the file.
3. However, each record should be returned as a dictionary. The keys will be strings (that I'll provide in a moment), and the values will be the strings from the current line.
4. The field names will be defined in a class attribute, `_RECORD_FIELDS`, which (as you can see) is going to be private.
5. Someone who calls `record_dicts` should get a generator back, which provides one dict per iteration.
6. If you can, add tests to check that you're getting reasonable values back.

```python
_RECORD_FIELDS = ['ip_address', 'auth_info1', 'auth_info2',
                 'timestamp_main_part', 'timestamp_tz', 'request',
                 'response_code', 'bytes_returned', 'referrer',
                 'user_agent']
```

Hint: Use the `dict` and `zip` functions to combine the strings in `_RECORD_FIELDS` with the values from the current line.

In [7]:
# to get pytest --cov to work, try installing (via pip) pytest-cov and coverage