![ine-divider](https://user-images.githubusercontent.com/7065401/92672068-398e8080-f2ee-11ea-82d6-ad53f7feb5c0.png)
<hr>

### Advanced Python: Iterators, Generators, Context Managers & Decorators

# Final project: Combining protocols

In this final project, we will specify two tasks that will utilize two or more of the overall techniques of protocols discussed in this course.

For these task, we use some data inside an SQLite database that has been used in some other INE courses as well.  The requirement is, of necessity, slightly forced, since a better solution than you will provide is simply "use the excellent DB-API and `sqlite3` driver."  However, in order to simulate the need from more from-scratch construction, suppose that you can **only** access the data via the following class.

In [15]:
import sqlite3
from collections import namedtuple

class RawTweets:
    "Limited interface for access to tweet database"
    known_names = {
        'airline', 'airline_sentiment', 'airline_sentiment_confidence',
        'airline_sentiment_gold', 'name', 'negativereason', 
        'negativereason_confidence', 'negativereason_gold', 
        'retweet_count', 'text', 'tweet_coord', 'tweet_created', 
        'tweet_id', 'tweet_location', 'user_timezone'}
    
    def __init__(self, *colnames):
        self.__db = sqlite3.connect('Airline-Tweets.sqlite')
        self.__cur = self.__db.cursor()
        if not set(colnames) <= self.known_names:
            raise ValueError(f"Must specify columns from {self.known_names}")
        self.__row = namedtuple('Row', colnames)
        self.__cur.execute(f"SELECT {','.join(colnames)} FROM Tweets;")
    
    @property
    def next_row(self):
        try:
            return self.__row(*self.__cur.fetchone())
        except:
            return None
    
    def close(self):
        self.__db.close()

In other words, the only API you are provided with is creating an instance of `RawTweets` initialized with column names.  The only public attributes are the property `.next_row` and the method `.close()`.

In [17]:
id_date = RawTweets('tweet_id', 'tweet_created')
id_date.next_row

Row(tweet_id=567588278875213824, tweet_created='2015-02-16 23:36:05 -0800')

In [18]:
print(id_date.next_row)
id_date.close()

Row(tweet_id=567590027375702016, tweet_created='2015-02-16 23:43:02 -0800')


In [19]:
try:
    RawTweets('user_name', 'text')
except Exception as err:
    print(err)

Must specify columns from {'user_timezone', 'negativereason_gold', 'tweet_coord', 'retweet_count', 'name', 'airline_sentiment_gold', 'tweet_created', 'airline_sentiment', 'airline_sentiment_confidence', 'airline', 'negativereason', 'tweet_location', 'text', 'negativereason_confidence', 'tweet_id'}


![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)

## Part 1

**An iterable row set**

Create a constructor for an iterable object that sequentially yields tuples of specified fields of the underlying data.  Suppose that the underlying data is very large, or very slow to derive, and therefore you definitely want a lazy iterator for each element rather than a concrete collection like a list.

```python
>>> from itertools import *
>>> for row in islice(tweet_rows('airline', 'airline_sentiment'), 1000, 1010):
...    print(row)
```
```
Row(airline='Southwest', airline_sentiment='positive')
Row(airline='Virgin America', airline_sentiment='neutral')
Row(airline='United', airline_sentiment='positive')
Row(airline='Delta', airline_sentiment='negative')
Row(airline='Delta', airline_sentiment='negative')
Row(airline='Delta', airline_sentiment='positive')
Row(airline='US Airways', airline_sentiment='negative')
Row(airline='Virgin America', airline_sentiment='positive')
Row(airline='US Airways', airline_sentiment='negative')
Row(airline='Delta', airline_sentiment='negative')
```

In [1]:
# your code goes here



![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)

## Part 2

**A context manager for row connections**

Opening an iterator over tuples of columns can be a useful abstraction, but it does not assure that the data stream is closed when you have finished working with it.  To free the resources connected with a stream, we should use a context manager to guard use of a stream.  As a challenge here, we would like the very same object to be usable as *either* a context manager *or* as an iteratable (much as the built-in command `open()` does for lines in a file.

```python
>>> for n, row in zip(range(5), TweetRows('tweet_location')):
...     print(n, row)
```
```
0 Row(tweet_location='USA')
1 Row(tweet_location='undecided')
2 Row(tweet_location='Washington, DC')
3 Row(tweet_location='')
4 Row(tweet_location='Los Angeles, CA')
```

```python
# This version explicitly closes the stream
>>> with TweetRows('tweet_location') as location:
...     for n in range(5):
...         print(n, next(location))
```
```
0 Row(tweet_location='USA')
1 Row(tweet_location='undecided')
2 Row(tweet_location='Washington, DC')
3 Row(tweet_location='')
4 Row(tweet_location='Los Angeles, CA')
```

In [2]:
# your code goes here



![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)

## Part 3

**Using TweetRows as a decorator**

In a similar fashion to how the lesson injected "orbits" into a single function the transforms a complex number, we would like to be able to decorate a fiunction that compares a single row object into a lazy iterable that compares each row of a stream in turn.  For example,

```python
>>> Row = namedtuple('Row', ['tweet_id', 'airline_sentiment', 'retweet_count'])
>>> row = Row(569927288751587328, 'negative', 31)
>>> def large_neg_sentiment(row):
...     return row.airline_sentiment=='negative' and row.retweet_count > 10
>>> large_neg_sentiment(row)
True
```

However, the decorated and vectorized version of function should produce this:

```python
>>> @TweetRows.deco('tweet_id', 'airline_sentiment', 'retweet_count')
... def large_neg_sentiment(row):
...     return row.airline_sentiment=='negative' and row.retweet_count > 10
>>> for row in large_neg_sentiment:
...     print(row)
...
Row(tweet_id=567897883875217408, airline_sentiment='negative', retweet_count=44)
Row(tweet_id=567909106553483264, airline_sentiment='negative', retweet_count=32)
Row(tweet_id=569927288751587328, airline_sentiment='negative', retweet_count=31)
Row(tweet_id=569932678688055296, airline_sentiment='negative', retweet_count=22)
Row(tweet_id=569950913554620416, airline_sentiment='negative', retweet_count=18)
```

The key thing is that we want to make sure `TweetRows` maintains the iterable and context manager behaviors you have already created.  Acting as a decorator should be an additional feature of the same object.  Notice that the decorated function becomes itself an iterable **not** merely a function that returns an iterable.

In [3]:
# your code goes here



![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)
