# Setting Up Our Interface Class

### Try it!
Let's take a minute to lay the groundwork for our strategy object importer:
1. Review the file structure and organization of the skeleton code we've provided.
2. Create a new `ImporterInterface` abstract class:
 - This class should implement a `can_ingest` class method which decides if a file is compatible with the importer.
 - A `parse` abstract class method signature which we will realize and fully complete in the children classes that implement the `ImporterInterface`.

```python
from abc import ABC, abstractmethod

from typing import List
from .Cat import Cat

class ImportInterface(ABC):

    allowed_extensions = []

    @classmethod
    def can_ingest(cls, path):
        ext = path.split('.')[-1]
        return ext in cls.allowed_extensions

    @classmethod
    @abstractmethod
    def parse(cls, path: str) -> List[Cat]:
        pass
```

<!--
%%ulab_page_divider
--><hr/>

## Importing Word Documents

### Try it!

Before implementing our code, we need to install the `python-docx` library to work with word documents in Python. This library requires a new version of a Python helper module called `setuptools`. To install the updated helper and the docx library, run:
```bash
pip install -U setuptools
pip install python-docx 
```

Then, we're ready to implement our first strategy object:
1. Create a new `DocxImporter` class that inherits `ImporterInterface`.
2. Implement the `parse` method that uses the `python-docx` library to read import data from a `docx` file.
3. Import and use your importer in the `run.py` file.

```python
from typing import List
import docx

from .ImportInterface import ImportInterface
from .Cat import Cat

class DocxImporter(ImportInterface):
    allowed_extensions = ['docx']
    
    @classmethod
    def parse(cls, path: str) -> List[Cat]:
        if not cls.can_ingest(path):
            raise Exception('cannot ingest exception')
        
        cats = []
        doc = docx.Document(path)
        
        for para in doc.paragraphs:
            if para.text != "":
                parse = para.text.split(',')
                new_cat = Cat(parse[0], int(parse[1]), bool(parse[2]))
                cats.append(new_cat)
                
        return cats
```

```python
from ImportEngine import DocxImporter

print(DocxImporter.parse('./data/cats.docx'))
```

<!--
%%ulab_page_divider
--><hr/>

## Importing CSV Files

### Try it!

Before implementing our code, we need to install the `pandas` library to work with csv files in python by running:
```bash
pip install pandas

```

Then, we're ready to implement our first strategy object:
1. Create a new `CSVImporter` class that inherits `ImporterInterface`.
2. Implement the `parse` method that uses the `pandas` library to read import data from a `csv` file.
3. Import and use your importer in the `run.py` file.

```python
from typing import List
import pandas

from .ImportInterface import ImportInterface
from .Cat import Cat

class CSVImporter(ImportInterface):
    allowed_extensions = ['csv']
    
    @classmethod
    def parse(cls, path: str) -> List[Cat]:
        if not cls.can_ingest(path):
            raise Exception('cannot ingest exception')
        
        cats = []
        df = pandas.read_csv(path, header=0)
        
        for index, row in df.iterrows():
            new_cat = Cat(row['Name'], row['Age'], row['isIndoor'])
            cats.append(new_cat)
            
        return cats
```

<!--
%%ulab_page_divider
--><hr/>

## Encapsulating Our Strategy Objects

### Try it!
Encapsulation can make our software easy to work with. Refactor your code to:
1. Include a new `Importer` class that will encapsulate the `CSVImporter` and `DocxImporter` classes. It should realize the `ImporterInterface`.
2. Write a `parse` method that makes a decision for which importer to use based on filetype.
3. Refactor `run.py` to consume the `Importer` class!


```python
from typing import List

from .ImportInterface import ImportInterface
from .Cat import Cat
from .DocxImporter import DocxImporter
from .CSVImporter import CSVImporter


class Importer(ImportInterface):
    importers = [DocxImporter, CSVImporter]
    
    @classmethod
    def parse(cls, path: str) -> List[Cat]:
        for importer in cls.importers:
            if importer.can_ingest(path):
                return importer.parse(path)
```

<!--
%%ulab_page_divider
--><hr/>

# Using Subprocess to Interface with CLI Tools

# Try it!

Before starting, make sure you have the xpdf tool installed by running:
```bash
sudo apt-get install -y xpdf
```

Next, try using the commandline tool to convert the pdf into a text file. In the terminal, run:
```bash
pdftotext data/cats.pdf tmp/a.txt
cat tmp/a.txt
```

The first command will convert from pdf to a text file. The second line (which is confusingly overloaded as `cat`) will concatenate the argument files and print on the standard output. In this case, we provide only the `a.txt` file so this file's content will be printed to the terminal window.

Finally, create your new `PDFImporter` class that performs the following steps:
1. Creates a random filename for the output.
2. Uses `supprocess` to call the `pdftotext` tool on the input path, saving to the random file.
3. Uses the Python language reference to open the text file and read it [line-by-line](https://stackabuse.com/read-a-file-line-by-line-in-python/).
4. For each line, parse a new `Cat` object.
5. Remove the temporary text file.
6. Return the list of cats.

```python
from typing import List
import subprocess
import os
import random

from .ImportInterface import ImportInterface
from .Cat import Cat

class PDFImporter(ImportInterface):
    allowed_extensions = ['pdf']

    @classmethod
    def parse(cls, path: str) -> List[Cat]:
        if not cls.can_ingest(path):
            raise Exception('Cannot Ingest Exception')

        tmp = f'./tmp/{random.randint(0,1000000)}.txt'
        call = subprocess.call(['pdftotext', path, tmp])
        
        file_ref = open(tmp, "r")
        cats = []
        for line in file_ref.readlines():
            line = line.strip('\n\r').strip()
            if len(line) > 0:
                parsed = line.split(',')
                new_cat = Cat(parsed[0], 
                              int(parsed[1]), 
                              bool(parsed[2]))
                cats.append(new_cat)

        file_ref.close()
        os.remove(tmp)
        return cats
```