_Lambda School Data Science_
## Productization Module 3, [Adding Data Science to a Web AppIication](https://github.com/LambdaSchool/DS-Unit-3-Sprint-4-Productization-and-Cloud/blob/master/module3-adding-data-science-to-a-web-application/README.md)

## Today's Plan:

### Templates (provided for you)
- `base.html`
- `prediction.html`
- `user.html`

### Functions (added by you)

#### `twitter.py`
- `add_or_update_user`
- `add_users`
- `update_all_users`

#### `predict.py`
- `predict_user`

#### `app.py`
- ` @app.route('/user/<name>', methods=['GET'])`
- ` @app.route('/user', methods=['POST'])`
- ` @app.route('/compare', methods=['POST'])`
- ` @app.route('/update')`

#### [GET and POST methods, explained](https://developer.mozilla.org/en-US/docs/Web/HTTP/Session#Request_methods)

HTTP defines a set of request methods indicating the desired action to be performed upon a resource. The most common requests are `GET` and `POST`:

- The `GET` method requests a data representation of the specified resource. Requests using `GET` should only retrieve data.
- The `POST` method sends data to a server so it may change its state. This is the method often used for HTML Forms.


# 1. route `/user/<name>`

### Prototype interactively

In [10]:
from twitoff.__init__ import *
from twitoff.twitter import *

with APP.app_context():
    name = 'Austen'
    tweets = User.query.filter(User.name == name).one().tweets
    for tweet in tweets:
        print(tweet.text)

Why I always make sure I’m the one who installs node https://t.co/WHPIi9R4xw
One of the things I’ve found myself thinking a lot about is @justGLew’s “little things are big things.”

Tiny decisions and actions compound. 

I find when I’m off in one area of life I’m off in everything.
I don’t even hear about people hired making 3-4x what they used to until I check our stats.

That’s always a fantastic surprise. https://t.co/z4EYi0vYOF
Can’t imagine how scary it must have been to make that jump.

From high school orchestra teacher (for ten years) to software engineer, hired before graduation.

Congratulations Matt! https://t.co/lhm6z02cgU
I love that Rippling used a memo instead of a deck to raise, but you have to admit the steps to raising for them really were:

1. Be Parker Conrad
2. Have a revenue *growth* graph that is up and to the right while building on double digit million ARR (🤯) https://t.co/MBXWpgoKrU
I’m not a PR expert, but I’ve got to think, “let’s organize to make people mi

`with APP.app_context` was needed above, because we're running from a notebook instead of inside `flask run` or `flask shell`. For more information, see:

- http://flask-sqlalchemy.pocoo.org/2.3/contexts/
- http://flask.pocoo.org/docs/1.0/appcontext/

### Route in `TwitOff/twitoff/app.py`

Within the `create_app` factory function

```
    @app.route('/user/<name>')
    def user(name):
        tweets = User.query.filter(User.name == name).one().tweets
        return render_template('user.html', title=name, tweets=tweets)
```


### Template at `TwitOff/twitoff/templates/user.html`

`user.html` is like `base.html` except with a for loop iterating over tweets instead of users:

```
        {% for tweet in tweets %}
        <span class="stack">{{ tweet.text }}</span>
        {% endfor %}
```

# 2. Add new user 

### From notebook!

With [tqdm](https://github.com/tqdm/tqdm) for progress bars!

In [1]:
from tqdm.auto import tqdm

In [2]:
from twitoff.__init__ import *
from twitoff.twitter import *

def add_user(username):
    """Add a user and their Tweets"""
    twitter_user = TWITTER.get_user(username)
    db_user = User(id=twitter_user.id, name=username)
    DB.session.add(db_user)
    
    # We want as many recent non-retweet/reply statuses as we can get
    # 200 is a Twitter API limit, we'll usually see less due to exclusions
    tweets = twitter_user.timeline(
        count=200, exclude_replies=True, include_rts=False,
        tweet_mode='extended')
    db_user.newest_tweet_id = tweets[0].id
    
    # tqdm adds progress bar
    for tweet in tqdm(tweets): 
        # Calculate embedding on the full tweet, but truncate for storing
        embedding = BASILICA.embed_sentence(tweet.full_text,
                                            model='twitter')
        db_tweet = Tweet(id=tweet.id, text=tweet.full_text[:300],
                         embedding=embedding)
        db_user.tweets.append(db_tweet)
        DB.session.add(db_tweet)

    DB.session.commit()

In [3]:
with APP.app_context():
    add_user('KenJennings')

HBox(children=(IntProgress(value=0, max=92), HTML(value='')))




## Make it fault-tolerant: add _or update_ user

What if you try to add a user that's already been added? You get a database error:

> IntegrityError: UNIQUE constraint failed: user.id

So, we'll make our function fault-tolerant and "idempotent"!

#### [Idempotent REST APIs](https://restfulapi.net/idempotent-rest-apis/)

> When making multiple identical requests has the same effect as making a single request – then that REST API is called idempotent.

>When you design REST APIs, you must realize that API consumers can make mistakes. They can write client code in such a way that there can be duplicate requests as well. These duplicate requests may be unintentional as well as intentional some time (e.g. due to timeout or network issues). You have to design fault-tolerant APIs in such a way that duplicate requests do not leave the system unstable.

So, instead of assigning `db_user` to a new `User` ...

```
db_user = User(...)
```

We can assign `db_user` to an existing `User` **or** a new `User`:

```
    db_user = (User.query.get(twitter_user.id) or
               User(id=twitter_user.id, name=username))
```

This is a common pattern in web applications. If `User.query.get(twitter_user.id)` returns `None`, that is `False`-y, so then `db_user` is assigned to the new `User(id=twitter_user.id, name=username))` instead.

Here's a simpler demo of how **`or`** works in Python:

In [11]:
1 or 2

1

In [12]:
None or 2

2

And now here's our `add_or_update_user` function:

In [4]:
def add_or_update_user(username):
    """Add or update a user and their Tweets"""
    twitter_user = TWITTER.get_user(username)
    db_user = (User.query.get(twitter_user.id) or
               User(id=twitter_user.id, name=username))
    DB.session.add(db_user)
    
    # We want as many recent non-retweet/reply statuses as we can get
    # 200 is a Twitter API limit, we'll usually see less due to exclusions
    tweets = twitter_user.timeline(
        count=200, exclude_replies=True, include_rts=False,
        tweet_mode='extended', since_id=db_user.newest_tweet_id)
    if tweets:
        db_user.newest_tweet_id = tweets[0].id
        
    # tqdm adds progress bar    
    for tweet in tqdm(tweets):
        # Calculate embedding on the full tweet, but truncate for storing
        embedding = BASILICA.embed_sentence(tweet.full_text,
                                            model='twitter')
        db_tweet = Tweet(id=tweet.id, text=tweet.full_text[:300],
                         embedding=embedding)
        db_user.tweets.append(db_tweet)
        DB.session.add(db_tweet)
        
    DB.session.commit()

Two more changes were made in the function above. 

[Tweepy has a `since_id` parameter:](http://docs.tweepy.org/en/3.7.0/api.html?highlight=since_id)

> `since_id` – Returns only statuses with an ID greater than (that is, more recent than) the specified ID.

We use this parameter so we don't re-retrieve and re-embed tweets we already have in the database. (If `db_user.newest_tweet_id` is `None` then Tweepy gets all the tweets.)

Also, we check whether a user has any tweets before trying to access the id of their 0th tweet. (This will prevent an error if a user doesn't have any tweets.)

```
    if tweets:
        db_user.newest_tweet_id = tweets[0].id
```

Now the function is "idempotent"!

In [5]:
with APP.app_context():
    add_or_update_user('KenJennings')

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))




### We can add more fault-tolerance, with try / except / else blocks

In [6]:
def add_or_update_user(username):
    """Add or update a user and their Tweets, error if not a Twitter user."""
    try:
        twitter_user = TWITTER.get_user(username)
        db_user = (User.query.get(twitter_user.id) or
                   User(id=twitter_user.id, name=username))
        DB.session.add(db_user)
        # We want as many recent non-retweet/reply statuses as we can get
        # 200 is a Twitter API limit, we'll usually see less due to exclusions
        tweets = twitter_user.timeline(
            count=200, exclude_replies=True, include_rts=False,
            tweet_mode='extended', since_id=db_user.newest_tweet_id)
        if tweets:
            db_user.newest_tweet_id = tweets[0].id         
        # tqdm adds progress bar
        for tweet in tqdm(tweets):
            # Calculate embedding on the full tweet, but truncate for storing
            embedding = BASILICA.embed_sentence(tweet.full_text,
                                                model='twitter')
            db_tweet = Tweet(id=tweet.id, text=tweet.full_text[:300],
                             embedding=embedding)
            db_user.tweets.append(db_tweet)
            DB.session.add(db_tweet)
    except Exception as e:
        print('Error processing {}: {}'.format(username, e))
        raise e
    else:
        DB.session.commit()

# 2. Add multiple users

In [7]:
def add_users(users):
    """
    Add/update a list of users (strings of user names).
    May take awhile, so run "offline" (interactive shell).
    """
    # tqdm adds progress bar
    for user in tqdm(users):
        add_or_update_user(user)

In [8]:
users = ['calebhicks', 'SteveMartinToGo', 'sadserver']

with APP.app_context():
    add_users(users)

HBox(children=(IntProgress(value=0, max=3), HTML(value='')))

HBox(children=(IntProgress(value=0, max=40), HTML(value='')))

HBox(children=(IntProgress(value=0, max=32), HTML(value='')))

HBox(children=(IntProgress(value=0, max=157), HTML(value='')))




# 3. Update all users

In [13]:
def update_all_users():
    """Update all Tweets for all Users in the User table."""
    # tqdm adds progress bar
    for user in tqdm(User.query.all()):
        add_or_update_user(user.name)

In [14]:
with APP.app_context():
    update_all_users()

HBox(children=(IntProgress(value=0, max=6), HTML(value='')))

HBox(children=(IntProgress(value=0, max=2), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))




# ASSIGNMENT

#### Add these functions to your Flask app
- Put `add_or_update_user`, `add_users`, and `update_all_users` in `twitter.py`
- Remove the `tqdm` progress bars from the for loops
- Import the functions in `app.py`

```python
from .twitter import add_or_update_user, add_users, update_all_users
```

#### Replace your `/user/<name>` route with these routes

```
    @app.route('/user', methods=['POST'])
    @app.route('/user/<name>', methods=['GET'])
    def user(name=None, message=''):
        name = name or request.values['user_name']
        try:
            if request.method == 'POST':
                add_or_update_user(name)
                message = "User {} successfully added!".format(name)
            tweets = User.query.filter(User.name == name).one().tweets
        except Exception as e:
            message = "Error adding {}: {}".format(name, e)
            tweets = []
        return render_template('user.html', title=name, tweets=tweets,
                               message=message)
```

***You will also need to add this import to the top of the file:*** `from flask import request`

#### Add an `/update` route

It should be like the Root route. But first, it should call your function to update all users. And it can display an appropriate title on the page, such as "All tweets updated!"

# 4. Predict!

In [15]:
import numpy as np
from sklearn.linear_model import LogisticRegression

In [16]:
user1_name = 'Austen'
user2_name = 'elonmusk'

In [17]:
with APP.app_context():
    user1 = User.query.filter(User.name == user1_name).one()
    user2 = User.query.filter(User.name == user2_name).one()
    user1_embeddings = np.array([tweet.embedding for tweet in user1.tweets])
    user2_embeddings = np.array([tweet.embedding for tweet in user2.tweets])
    user1_labels = np.ones(len(user1.tweets))
    user2_labels = np.zeros(len(user2.tweets))

In [18]:
user1_embeddings.shape, user2_embeddings.shape, user1_labels.shape, user2_labels.shape

((44, 768), (25, 768), (44,), (25,))

In [19]:
user1_embeddings

array([[-0.103232 , -0.062429 ,  0.902218 , ...,  0.753637 ,  0.169047 ,
         0.138494 ],
       [-0.665428 , -0.165065 ,  0.538794 , ...,  0.58845  ,  0.271845 ,
        -0.249931 ],
       [-0.308288 , -0.0839958,  0.689021 , ...,  0.372962 ,  0.031863 ,
         0.177151 ],
       ...,
       [-0.0963159,  0.266888 ,  1.11203  , ...,  0.718692 ,  0.283105 ,
        -0.694082 ],
       [-0.439849 , -0.168013 ,  0.324245 , ...,  0.664957 , -0.070829 ,
         0.673376 ],
       [-0.472292 , -0.141997 ,  0.482492 , ...,  0.599214 ,  0.0679974,
        -0.264179 ]])

In [20]:
user1_labels

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [21]:
user2_embeddings

array([[-0.0939891 , -0.0589629 ,  0.661137  , ...,  0.867942  ,
         0.442749  , -0.166158  ],
       [-0.488076  , -0.342566  ,  0.849732  , ...,  0.611597  ,
         0.256545  ,  0.131419  ],
       [-0.355265  , -0.0888007 ,  1.3744    , ...,  0.827497  ,
         0.188297  , -0.0391178 ],
       ...,
       [-0.356987  , -0.200916  ,  0.87187   , ...,  0.435991  ,
        -0.00627333,  0.16747   ],
       [-0.160194  , -0.400693  ,  1.10744   , ...,  0.644567  ,
         0.0514417 , -0.0711063 ],
       [-0.176499  , -0.113999  ,  1.26159   , ...,  0.725719  ,
         0.210874  ,  0.153916  ]])

In [22]:
user2_labels

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0.])

In [23]:
embeddings = np.vstack([user1_embeddings, user2_embeddings])
labels = np.concatenate([user1_labels, user2_labels])

embeddings.shape, labels.shape

((69, 768), (69,))

In [24]:
log_reg = LogisticRegression(solver='lbfgs', max_iter=1000)
log_reg.fit(embeddings, labels)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=1000, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [25]:
log_reg.score(embeddings, labels)

1.0

In [26]:
from sklearn.model_selection import cross_val_score
cross_val_score(log_reg, embeddings, labels, cv=3)

array([0.83333333, 0.7826087 , 0.77272727])

In [27]:
tweet_text = 'Income Share Agreements align incentives. Welcome to the future of education.'
tweet_embedding = BASILICA.embed_sentence(tweet_text, model='twitter')
log_reg.predict(np.array(tweet_embedding).reshape(1, -1))

array([1.])

In [28]:
log_reg.predict_proba(np.array(tweet_embedding).reshape(1, -1))

array([[0.00906047, 0.99093953]])

In [29]:
tweet_text = 'SpaceX will launch another Tesla into orbit'
tweet_embedding = BASILICA.embed_sentence(tweet_text, model='twitter')
log_reg.predict(np.array(tweet_embedding).reshape(1, -1))

array([0.])

In [30]:
log_reg.predict_proba(np.array(tweet_embedding).reshape(1, -1))

array([[0.86294539, 0.13705461]])

In [31]:
tweet_text = 'Today we launch a new initiative'
tweet_embedding = BASILICA.embed_sentence(tweet_text, model='twitter')
log_reg.predict_proba(np.array(tweet_embedding).reshape(1, -1))

array([[0.34059627, 0.65940373]])

# ASSIGNMENT

### Create `TwitOff/twitoff/predict.py`

Refactor the notebook code into a function, named `predict_user`.

The code you need is already here. You just need to put it in a function in a `.py` file.

The function should take three strings as parameters:
- User 1 name
- User 2 name
- Tweet text

The function should determine and return which user is more likely to say a given tweet. (`return log_reg.predict(...)`)

Import what you need from `numpy`, `sklearn`, and your `.models` and `.twitter` modules.

### Add this `/compare` route

```
    @app.route('/compare', methods=['POST'])
    def compare(message=''):
        user1, user2 = sorted([request.values['user1'],
                               request.values['user2']])
        if user1 == user2:
            message = 'Cannot compare a user to themselves!'
        else:
            prediction = predict_user(user1, user2, request.values['tweet_text'])
            message = '"{}" is more likely to be said by {} than {}'.format(
                request.values['tweet_text'], user1 if prediction else user2,
                user2 if prediction else user1)
        return render_template('prediction.html', title='Prediction', message=message)
```

***You will also need to add this import to the top of the file:*** `from .predict import predict_user`