# Code Refactoring

Summary of the problem:

In order to display streaming data in the form of a graph, two python files were written:

tweet_stream.py: collects streaming twitter data

original.py: code that displays data as a Bokeh graph based on parameters from the streamed data

The first python file runs fine and I am able to store data locally:

tweet_stream.py

In [None]:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json

#Variables that contains the user credentials to access Twitter API 
access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''



#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status


if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
    stream.filter(track=['python', 'javascript', 'ruby'])

Streaming data can be stored locally in a text file:

In [None]:
python tweet_stream.py > yall5.txt

When I run original.py*, I get a number errors.

*(see the following for original code: https://github.com/lesleymaraina/RefactorCode)

## Challenge

I would like to learn more about the general strategies used to refactor code written in iPython Notebook format so that I can run code independently of iPython notebook. The following highlights what I learned on Saturday and also notes a few additional questions. Your feedback would be greatly appreciated :).

### Strategy 1. Identify and list global variables used throughout the code in a new section at the top of the code. 

See the following as an example:

Original code:

In [None]:
f.scatter(geo_tweets[geo_tweets["python"]].latt, geo_tweets[geo_tweets["python"]].long, color="indianred", legend="javascript")
f.scatter(geo_tweets[geo_tweets["javascript"]].latt, geo_tweets[geo_tweets["javascript"]].long, color="indianred")
f.scatter(geo_tweets[geo_tweets["ruby"]].latt, geo_tweets[geo_tweets["ruby"]].long, color="blue",
          legend='ruby')


Strategy: Assign the following terms to a single variable: "python", "javascript", and "ruby". These are terms used repeatedly throughout the code and could also be used by other members of the team. List the new variable are the top of the code.

Modified code:

In [None]:
########################################################
# GLOBAL VARIABLES
########################################################
"""
these will be variables that will be used by EVERY user who runs this code
"""
global_search_terms = ['python', 'javascript', 'ruby']

Q1: Other than identifying terms used repeatedly throughout the code, are there other patterns in the code that you look for that can be assigned as global variables?

### Strategy 2. Encapsulating blocks of code as independent functions

Code is often written in an abbreviated format that can be run in iPython Notebook development environments. Yet, when the code is copied into a seperate python file, I often run into errors when I try to run the code (see original.py)

The following block of code was encapsulated in an indpendent function.

Original Code:

In [None]:
f = figure(plot_width=1200, plot_height=900)
    
firstterm = f.scatter(geo_tweets[geo_tweets[global_search_terms[0]]].latt, geo_tweets[geo_tweets[global_search_terms[0]]].long, color="indianred", legend="python")
secondterm = f.scatter(geo_tweets[geo_tweets[global_search_terms[1]]].latt, geo_tweets[geo_tweets[global_search_terms[1]]].long, color="indianred")
thirdterm = f.scatter(geo_tweets[geo_tweets[global_search_terms[2]]].latt, geo_tweets[geo_tweets[global_search_terms[2]]].long, color="blue",
                          legend='you guys')
        
# stylistic stuff:                      
f.grid.grid_line_color = None
f.xaxis.axis_line_color = None
f.yaxis.axis_line_color = None
f.yaxis.major_tick_line_color = None
f.axis.major_label_standoff = 0                   
show(f)

Modified Code:

In [None]:
def plotter(firstterm,secondterm,thirdterm):
    f = figure(plot_width=1200, plot_height=900)
    
    firstterm = f.scatter(geo_tweets[geo_tweets[global_search_terms[0]]].latt, geo_tweets[geo_tweets[global_search_terms[0]]].long, color="indianred", legend="python")
    secondterm = f.scatter(geo_tweets[geo_tweets[global_search_terms[1]]].latt, geo_tweets[geo_tweets[global_search_terms[1]]].long, color="indianred")
    thirdterm = f.scatter(geo_tweets[geo_tweets[global_search_terms[2]]].latt, geo_tweets[geo_tweets[global_search_terms[2]]].long, color="blue",
                          legend='you guys')
        
    # stylistic stuff:                      
    f.grid.grid_line_color = None
    f.xaxis.axis_line_color = None
    f.yaxis.axis_line_color = None
    f.yaxis.major_tick_line_color = None
    f.axis.major_label_standoff = 0                   
    show(f)

Q2. When reviewing code in its original iPython Notebook format, what are the indicators of the beginning and end of a function? Are methods such as "return" and "show()" always used to indicate breaks between independent functions?

Q3: Why were the f.scatter lines of code assigned to variables? Did I assign them correctly?

### Strategy 3: Store variables in a __main__ function

Specific varables within the python code were assigned to the __main__ function. These variables use functions defined in previous lines of code which is why they are stored in the __main__ function.
Q3: Did I describe this correctly? By storing these variables in the __main__ function, does this esssentially tell Python to execute the previously defined functions (ie:pop_tweets, pop_keys) when the code is run in the terminal?

Original code:

In [None]:
geo_tweets = pop_keys(pop_tweets('yall.txt'), ['python', 'javascript', 'ruby'])
geo_tweets = pd.concat([geo_tweets, yall_tweets], ignore_index=True)

print('total geo_tweets:', len(geo_tweets))
print('python', len(geo_tweets[geo_tweets['python']]))

f = figure(plot_width=1200, plot_height=900,)
....
......
.......


Modified code:

In [None]:
if __name__ == '__main__':
    """
        these will be function calls that only you or maybe the team will run
        """
    geo_tweets = pop_keys(pop_tweets('yall5.txt'), global_search_terms)
    # yall_tweets = pop_keys(pop_tweets('yall_5.txt'), ['python', 'javascript', 'ruby'])
    geo_tweets = pd.concat([geo_tweets], ignore_index=True)
    
    print('total geo_tweets:', len(geo_tweets))
    print('python', len(geo_tweets[geo_tweets['python']]))
    
    plotter(global_search_terms[0],global_search_terms[1],global_search_terms[2])

Q4: Formatting question. Why isn't the code written as follows (would this produce the same output?):

In [None]:
def main():
    geo_tweets = pop_keys(pop_tweets('yall5.txt'), global_search_terms)
    # yall_tweets = pop_keys(pop_tweets('yall_5.txt'), ['python', 'javascript', 'ruby'])
    geo_tweets = pd.concat([geo_tweets], ignore_index=True)
    
    print('total geo_tweets:', len(geo_tweets))
    print('python', len(geo_tweets[geo_tweets['python']]))
    
    plotter(global_search_terms[0],global_search_terms[1],global_search_terms[2])

if __name__=='__main__':
	main() 

code source: http://aflyax.github.io/twitter-geo/