## REST, JSON and Text

### Class Objectives

* Understand what REST and JSON are, usage in python
* Review string parsing functionality and dummies in pandas (with merge, join, and concat)
* Reviewing some of the useful functionality in nltk
* Reviewing concepts of text handling (counting, stopwords, corpi and dictionaries)

### Data Goals

1. Given twitter data, we can build some understanding of what news to expect given a hashtag?

2. Is there any difference in sentiment across hashtags about the same topic?

3. What creative ways can we engineer features in order to predict a hashtag?

### REST and JSON

REST stands for Representational State Transfer. It's a simplification of architecture for networked applications. The basics found in Rails and Django apps insist on a RESTful framework for its ease of use compared to other architectures, such as CORBA, or SOAP.

RESTful applications do four primary things around "resources:"

* GET: retrieve the collection of data
* PUT: Replace or update the collection of data
* POST: Create a new collection of data
* DELETE: Delete the collection of data

It's also the premise for AJAX requests, and primarily how web services interact between your client (your computer) and the web server (where the web application resides). These AJAX calls primarily use JSON to send data between the server and client.

JSON stands for JavaScript Object Notation. For Python, it essentially looks like a stringified version of a dictionary:

In [1]:
valid_json = '{ "some_kinda_key": "some_kinda_value" }'

also_valid_json = '[{ "some_kinda_key": "some_kinda_value" }, { "some_other_key": "some_other_value" }]'

actually_a_dictionary = {'some_kinda_key': 'some_kinda_value'}

Since JSON uses a hash/dictionary-like format, it is traditionally unstructured, compared to CSV files, though when interacting with APIs (Application Programming Interface), if you're GETing a series of the same collection, generally, the data will look the same.

To load JSON into python, we use the json module. To find JSON, we'll dig it out of a network transfer from a website (petfinder will do, in this case).

In [2]:
import json

In [4]:
pets = json.load(open('petfinder.json'))

print type(pets)
print pets.keys()
print type(pets['results'])
print len(pets['results'])
print json.dumps(pets['meta_data'])
print type(json.dumps(pets))

<type 'dict'>
[u'meta_data', u'results', u'links']
<type 'list'>
15
{"total_results": 346182, "query": {"status": "Adoptable", "lon": "-73.9892", "page_size": "15", "page_number": "0", "location": "10003", "lat": "40.7316", "uri_prefix": "http://www.petfinder"}, "rows": 15, "page_number": 0, "query_id": "2A0BD832-546C-11E4-9E25-F1C9FFEE616C"}
<type 'str'>


In the above JSON, it looks like petfinder by default returns a list of 15 animals.

Petfinder also has an API, so you can get an API key and build requests using the requests package in python. You should get similar results.

In [6]:
import pandas as pd

In [7]:
petdf = pd.DataFrame(pets['results'])
petdf                     

Unnamed: 0,age,animal,contact_name,coords,country_code,date_updated,description,email_address,export,export_api,...,primary_breed,record_nav_links,region_code,secondary_breed,shelter_id,shelter_name,size,species,status,street_address
0,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-10-08T18:34:07Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Boxer,{u'next': u'http://www.petfinder/2A0BD832-546C...,NY,Staffordshire Bull Terrier,NY835,Social Tees Animal Rescue Foundation,Medium,Dog,Adoptable,please email dimitra.socialtees@gmail.com
1,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-08-20T20:56:44Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Manchester Terrier,{u'previous': u'http://www.petfinder/2A0BD832-...,NY,,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com
2,Adult,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-03-07T20:09:56Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Chihuahua,{u'next': u'http://www.petfinder/2A0BD832-546C...,NY,,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com
3,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-07-15T01:11:37Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Boston Terrier,{u'previous': u'http://www.petfinder/2A0BD832-...,NY,Chihuahua,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com
4,Adult,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-10-13T13:14:10Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Shih Tzu,{u'previous': u'http://www.petfinder/2A0BD832-...,NY,Jack Russell Terrier,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com
5,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-07-28T15:09:15Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Chihuahua,{u'previous': u'http://www.petfinder/2A0BD832-...,NY,Terrier,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com
6,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-04-27T19:30:38Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Shih Tzu,{u'next': u'http://www.petfinder/2A0BD832-546C...,NY,,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com
7,Young,Cat,Samantha Brody,"40.7316,-73.9892",US,2014-06-06T13:59:06Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Tortoiseshell,{u'next': u'http://www.petfinder/2A0BD832-546C...,NY,,NY835,Social Tees Animal Rescue Foundation,Medium,Cat,Adoptable,please email dimitra.socialtees@gmail.com
8,Adult,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-07-21T23:29:08Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Chihuahua,{u'previous': u'http://www.petfinder/2A0BD832-...,NY,Terrier,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com
9,Young,Dog,Dimitra Bennett,"40.7316,-73.9892",US,2014-09-20T23:27:52Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,dimitra@socialteesnyc.org,True,True,...,Labrador Retriever,{u'next': u'http://www.petfinder/2A0BD832-546C...,NY,Husky,NY835,Social Tees Animal Rescue Foundation,Medium,Dog,Adoptable,please email dimitra.socialtees@gmail.com


Pulling data from text
One technique for working with strings in pandas: pulling out strings. Imagine we wanted to create a feature called "is_terrier." Our steps would be to:

1. Use the two breed fields
2. Check if either field if the "terrier" text is included
3. Set a 1 if either field has "terrier" or a 0 if it does not.

Sample Code:

In [8]:
petdf['is_terrier'] = 0 # let's default to 0, so we only need to update as we recognize values
petdf.ix[petdf.primary_breed.str.contains('Terrier'), 'is_terrier'] = 1
petdf.ix[(petdf.secondary_breed.notnull()) & (petdf.secondary_breed.str.contains('Terrier')), 'is_terrier'] = 1

print petdf.groupby('is_terrier').age.count()

is_terrier
0    8
1    7
Name: age, dtype: int64


### Creating dummy variables
Another instance we'll commonly need to handle is converting categorical data into numerical data. While some domain languages (such as R) have built in ways to numerate categorical data, we'll need to depend on keeping everything numerical in python.

After creating a dummy matrix, we'll want to use a join to bridge the data back together. Note the variety of ways to join data together in pandas:

function - description

* append - functionality that works much like list.append(). imagine taking one data frame and appending it to another at the bottom (by rows)
* concat - this functionality would take a list of dataframes, where the first is the "primary", and all others are appended, like a queue.
* merge - link two data frames together, by merging columns together when the two columns defined to merge on are equal.
* join - runs very similarly to merge, but specifically uses the indexes. very common to use alongside get_dummies.

Below, we'll create a new data frame instance that "dummies" the size of the dog, and then attaches the sizes as new columns.

In [9]:
import numpy as np

# simple proof of concept on how concat works; otherwise, ignore
newdf = pd.concat([petdf[:7], petdf[7:]])
print newdf == petdf
# NaN is always false on equality (python gotcha!)
print
print 'numpy nan truth check!'
print np.nan == np.nan
print
sizes = pd.get_dummies(petdf['size'])
print sizes.head()

petdf_wsizes = petdf.join(sizes)
petdf_wsizes.head()

     age animal contact_name coords country_code date_updated description  \
0   True   True         True   True         True         True        True   
1   True   True         True   True         True         True        True   
2   True   True         True   True         True         True        True   
3   True   True         True   True         True         True        True   
4   True   True         True   True         True         True        True   
5   True   True         True   True         True         True        True   
6   True   True         True   True         True         True        True   
7   True   True         True   True         True         True        True   
8   True   True         True   True         True         True        True   
9   True   True         True   True         True         True        True   
10  True   True         True   True         True         True        True   
11  True   True         True   True         True         True        True   

Unnamed: 0,age,animal,contact_name,coords,country_code,date_updated,description,email_address,export,export_api,...,secondary_breed,shelter_id,shelter_name,size,species,status,street_address,is_terrier,Medium,Small
0,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-10-08T18:34:07Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Staffordshire Bull Terrier,NY835,Social Tees Animal Rescue Foundation,Medium,Dog,Adoptable,please email dimitra.socialtees@gmail.com,1,1.0,0.0
1,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-08-20T20:56:44Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com,1,0.0,1.0
2,Adult,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-03-07T20:09:56Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com,0,0.0,1.0
3,Young,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-07-15T01:11:37Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Chihuahua,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com,1,0.0,1.0
4,Adult,Dog,Samantha Brody,"40.7316,-73.9892",US,2014-10-13T13:14:10Z,Please VISIT OUR FACEBOOK PAGE AND WEBSITE for...,samantha@socialteesnyc.org,True,True,...,Jack Russell Terrier,NY835,Social Tees Animal Rescue Foundation,Small,Dog,Adoptable,please email dimitra.socialtees@gmail.com,1,0.0,1.0


In [11]:
# coding=utf-8
from twitter import *
import twitterconfig as config

In [12]:
"""
there would be code here to set up a dictionary including everything required to load the authentication for twitter.
What are the advantages of setting up a dictionary here instead of just defining it below?
"""
t = Twitter(
    auth=OAuth(
        token=config.TOKEN,
        token_secret=config.TOKEN_SECRET,
        consumer_key=config.CONSUMER_KEY,
        consumer_secret=config.CONSUMER_SECRET)
    )

hashtags=[
    u'hongkong',
    u'occupycentral',
    u'umbrellarevolution',
    u'china',
    u'hk',
    u'admiralty',
    u'occupyhk',
]

all_results = []
for hashtag in hashtags:
    results = t.search.tweets(q='#'+hashtag, count=100, result_type='mixed')
    for r in results['statuses']:
        try:
            clean_tweet = unicode(r['text']).encode('utf-8').replace('\n', ' ').replace('\r', ' ').replace('"', "'")
            print u','.join([unicode(r['id']),'"'+clean_tweet+'"', '"'+hashtag+'"'])
        except UnicodeDecodeError:
            pass

724027359427473408,"Amazing stacked architecture of #HongKong shows housing of rather dense population.(Julia Wimmerlin via Mail Online) https://t.co/N3Tv5hfs5Z","hongkong"
724362388728762368,"#HR #Job alert: HR Operations Coordinator | Manulife | #HongKong https://t.co/KrW7wCZZxX #Jobs #Hiring","hongkong"
724361734312628224,"Check out the #HKTB for great family travel deals t Hong Kong - https://t.co/WRVNJDRCKn #HongKong #HongKongtravel","hongkong"
724361438710501377,"Interested in a #IT #job near #HongKong? This could be a great fit: https://t.co/UUwKsCovwS #CareersAtTU #LifeAtTU #Hiring","hongkong"
724361386747285505,"#anzacday2016 #anzacday in #hongkong amazing numbers attending #dawnservice https://t.co/qlSHzph7ej","hongkong"
724360996286922752,"3 days in #HongKong full of excitement, amazing food, hiking and amazing views..next stop Tokyo #ourcrazylives https://t.co/HmkRShKAXR","hongkong"
724360298614894597,"Moderate pollution (27) at 5AM. Low for #HongKong. Smile and go out for 

### Working with text and using NLTK

nltk stands for natural language toolkit. It's primary purpose in our data science toolkit sits around the following functionality:

* parsing text objects into lists of tokens
* providing context and similarity between text
* containers for large amounts of text (various texts from project gutenberg are included for download)
* tagging words
* building predictive models using text

Below we'll load in some previously generated twitter data into a pandas dataframe, though with some integration, it wouldn't be a stretch to stream them directly into a pandas data frame, or into a SQL database, which we could later pull in with pandas.

In [13]:
import pandas as pd

tweets = pd.read_csv('twitter5.csv')
tweets = tweets

print tweets

                      id                                              tweet  \
0     522205943074160640  Is this #HongKong 's Rodney King? Police need ...   
1     521669229188501504  'We won't move and I'm ready to get arrested',...   
2     522228472786067456  RT @stanyee: Footage of beating prompts #HongK...   
3     522228386002108418  What is happening in Hong Kong is something th...   
4     522228373964480512  #Funding:#HongKong #travel #startup @KlookTrav...   
5     522228351231344640  HK police use pepper spray on protesters, beat...   
6     522228250450206720  RT @stanyee: Footage of beating prompts #HongK...   
7     522228232330817536  RT @stanyee: Footage of beating prompts #HongK...   
8     522228119952822272  Squeezed. #vscocam #vsco_hub #vscogang #vscogr...   
9     522228020069679104  RT @stanyee: Footage of beating prompts #HongK...   
10    522227916659105792  BREAKING: #HongKong security chief says 6 poli...   
11    522227892198334464  Trendingnews: #hongkong #d

### Finding uniqueness of tweets

One important thing to do would be to find the uniqueness of a dataset. Here, we should measure uniqueness as number of unique tweets / number of tweets in the data set.

In [29]:
def tweet_uniqueness(series):
    return float(len(series.unique()))/len(series)
    #return (len(series.unique()) / len(series)) * 100

# shows that with the code above, we didn't get completely unique tweets.
print(tweet_uniqueness(tweets.id))

# code to drop duplicates based on the id column alone.
unique_tweets = tweets.drop_duplicates(subset=['id'])

print len(unique_tweets)
print tweet_uniqueness(unique_tweets.id)
print tweet_uniqueness(unique_tweets.tweet)

0.761363636364
1139
1.0
0.779631255487


In [30]:
for i in unique_tweets.hashtag.unique():
    print i, tweet_uniqueness(unique_tweets[unique_tweets.hashtag == i].tweet)

hongkong 0.863636363636
occupycentral 0.719047619048
umbrellarevolution 0.630872483221
china 0.950980392157
hk 0.970588235294
admiralty 0.716981132075
occupyhk 0.610778443114


Next we'll use nltk to tokenize the tweets. Tokens are really just units of words within parsed text.

To prep nltk and many of its functionalities, we'll need to download additional parts:

In [32]:
import nltk

def tokenize_tweet(t, remove_stop=True, remove_hashtag=False):
    import string
    import re
    tweet = t
    tweet = tweet.lower()
    tweet = re.sub('@\w+', 'TWITTER_HANDLE', tweet)
    tweet = re.sub('(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?', 'URL', tweet)
    tweet = tweet.translate(string.maketrans("",""), string.punctuation)
    words = nltk.tokenize.wordpunct_tokenize(tweet)
    if remove_stop:
        # How do we filter for words in the stopwords corpus?
        stopwords_filter = set(nltk.corpus.stopwords.words('english'))
        words = [word for word in words if word not in stopwords_filter]
    if remove_hashtag:
        # How do we filter out the actual hashtag in the tweet itself?
        words = [word for word in words if word[0] != '#']
    return words


unique_tweets['tokens'] = unique_tweets.tweet.apply(tokenize_tweet, remove_stop=True)
unique_tweets['tokens_w_stopwords'] = unique_tweets.tweet.apply(tokenize_tweet, remove_stop=False)

print unique_tweets['tokens_w_stopwords']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


0       [is, this, hongkong, s, rodney, king, police, ...
1       [we, wont, move, and, im, ready, to, get, arre...
2       [rt, TWITTERHANDLE, footage, of, beating, prom...
3       [what, is, happening, in, hong, kong, is, some...
4       [fundinghongkong, travel, startup, TWITTERHAND...
5       [hk, police, use, pepper, spray, on, protester...
6       [rt, TWITTERHANDLE, footage, of, beating, prom...
7       [rt, TWITTERHANDLE, footage, of, beating, prom...
8       [squeezed, vscocam, vscohub, vscogang, vscogra...
9       [rt, TWITTERHANDLE, footage, of, beating, prom...
10      [breaking, hongkong, security, chief, says, 6,...
11      [trendingnews, hongkong, demonstranten, aangek...
12      [TWITTERHANDLE, in, hongkong, umbrellamovement...
13      [TWITTERHANDLE, in, hongkong, umbrellamovement...
14      [map, of, the, underpass, in, hongkong, where,...
15      [TWITTERHANDLE, in, hongkong, umbrellamovement...
16      [rt, TWITTERHANDLE, 37, men, and, 8, women, ar...
17            

In [33]:
# pos_tag is a part of speech tagger, based on the text that it ingests.
# It needs some kind of sentence structure to work okay, so we'll use the tokens with stopwords.
# While its not built for twitter data, we can try it out and see how accurate it is
nltk.pos_tag
unique_tweets['pos'] = unique_tweets['tokens_w_stopwords'].apply(nltk.pos_tag)

# Printing out all words that come back as adjectives (JJ):
def find_all_adj(series):
    bag_of_words = [j[0] for j in series if j[1] == 'JJ']
    return bag_of_words if bag_of_words else []
    
adjectives = unique_tweets.pos.apply(find_all_adj)

final_list = []
for i in list(adjectives):
    final_list.extend(list(set(i)))

final_list = list(set(final_list))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [34]:
print final_list

['heutejournal', 'cute', 'malicious', 'excessive', 'chinese', 'global', 'displayed', 'yellow', 'kong', 'violent', 'own', 'asian', 'human', 'innovative', 'japan', 'hate', 'suisheng', '116th', 'le', 'police', 'jian', 'leung', 'much', 'hurtful', 'extra', 'young', 'only', 'rich', 'rabble', 'cardinal', 'te', 'worth', 'real', 'good', 'animal', 'big', 'stop', 'possible', 'finish', 'dark', 'joint', 'traffic', 'shameful', 'anonymous', 'international', 'front', 'viral', 'trouble', 'occupyhongkong', 'helpful', 'cable', 'tear', 'shy', 'unnecessary', 'large', 'bad', 'small', 'passionate', 'insane', 'r', 'financial', 'fair', 'tepid', 'national', 'easy', 'dead', 'breakthrough', 'likely', 'economic', '3nsailing', 'complete', 'agricultural', 'corrupt', 'close', 'sexual', 'special', 'outrageous', 'tungchung', 'symbiotic', 'beaten', 'legal', 'creative', 'current', 'outside', 'indian', 'various', 'new', 'adorable', 'public', '3d', 'available', 'pepper', 'full', 'christian', 'pathetic', 'wan', 'sixth', 'fu

In [35]:
adjective_list = {
    'industrial': 0,
    'excessive': -1,
    'gratuitous': 1,
    'chaotic': -1,
    'national': 1,
    'young': 1,
    'yellow': 0,
    'high': 1,
    'middle': 0,
    'likely': 1,
    'economic': 0,
    'creative': 1,
    'open': 1,
    'physical': 0,
    'symbiotic': 1,
    'legal': 1,
    'next': 1,
    'genetic': 0,
    'angry': -1,
    'strong': 1,
    'peaceful': 1,
    'new': 1,
    'widespread': 1,
    'real': 1,
    'good': 1,
    'normal': 0,
    'successful': 1,
    'big': 1,
    'basic': -1,
    'hate': -1,
    'private': -1,
    'front': 0,
    'central': 0,
    'comfortable': 1,
    'last': 0,
    'helpful': 1,
    'third': 0,
    'many': 1,
    'clear': 1,
    'proud': 1,
    'brutal': -1,
    'large': 1,
    'dirty': -1,
    'professional': 1,
    'first': 0,
}

### Goal: Using the adjectives and some measure of sentiment, predict a given hashtag?

We'll need to build a couple more functions:

1. Let's write a function that creates a sentiment score for each tweet based on the adjectives above.

2. Set the targets of the model. We can do this two ways:

    1.One, using the hashtag column, which is what was generated based on the twitter search
    
    2.Create several target columns, based on the tweet itself, using a regex match for the hashtag.

3. Finally, build a logistic regressor using NLTK's sklearn implementation using our created sentiment as a regressor.

In [36]:
% matplotlib inline
import seaborn as sns

# First function: create a sentiment score column.
# Should take in a list of words, and return back the score as
# mean(sentiment_of_adjectives)
def measure_sentiment(words):


# Second function: Create a numeric target column.
def numeric_hashtag(tag):
    # we could use a dictionary similar to above to easily map hashtags to numeric values.
    targets = {
        u'hongkong': 0,
        u'occupycentral': 1,
        u'umbrellarevolution':2,
        u'china':3,
        u'hk':4,
        u'admiralty':5,
        u'occupyhk':6,
    }
    return targets[tag]

unique_tweets['sentiment'] = unique_tweets.tokens.apply(measure_sentiment)
print unique_tweets.sentiment.hist()

unique_tweets['target'] = unique_tweets.hashtag.apply(numeric_hashtag)

# Build a logistic regression using the sentiment feature and the numeric hashtags
from sklearn import linear_model as lm

lmfit = lm.LogisticRegression()
lmfit.fit(unique_tweets[['sentiment']], unique_tweets['target'])
print lmfit.score(unique_tweets[['sentiment']], unique_tweets['target'])

IndentationError: expected an indented block (<ipython-input-36-e7f1e4052762>, line 11)