## 1. Setup and Essentials
### 1.1 Libraries

Setting up the library is as simple as importing the Conversation object, along with the progress bar utility. [tqdm](https://tqdm.github.io/) is a kickass progressbar library for python with jupyter support, and you can see I'm importing the notebook version instead of the regular one. Replace with ```from tqdm import tqdm``` if you're executing on terminal.

Note: I'm importing the necessary libraries as I need in the notebook (instead of clustering all imports at the top even though it physically hurts not to) so you can choose not to use them. 

In [None]:
from converse import Conversation
from tqdm import tqdm_notebook as tqdm

### 1.2. Creating a new Conversation

Creating a new conversation is as simple are creating the conversation object and using the load function to add json messages. The load function can be called multiple times, and we'll cover loading multiple conversations later.

In [None]:
janes = Conversation() 
janes.load("Sample_Convo/message.json")

The ```messages``` object contains the plain JSON of all the messages loaded into the object, and this is directly accessible for additional flexibility.

In [None]:
print "Loaded %d messages" % (len(janes.messages))

### 1.3 Exporting conversations

Conversations can be exported in three main ways: as plain JSON, a [pandas](https://pandas.pydata.org/) DataFrame, or as a unicode CSV. All three are shown below: 

In [None]:
import pandas as pd
import json
pd.set_option('display.max_rows', 10000)
janes.save_csv_utf8("test.csv")
print json.dumps(janes.messages)[:100]
janes.get_df()[:1]

### 1.4 Sentiment Analysis

Sentiment analysis is being done using the [TextBlob](https://textblob.readthedocs.io/en/dev/) library. The ```get_sentiment``` function can be used to test functionality, and modified to add possible edge cases or plug in other libraries. As we can see below, it works most of the time but it's not really perfect.

Note: A subjectivity score is provided by the library, and while this is added to the data in the object, it's not really used for anything.

In [None]:
print janes.get_sentiment("This is awesome!")
print janes.get_sentiment("DC movies are bad")
print janes.get_sentiment("Tony, I don't feel so good")

## 2. Plotting

### 2.1 Plotting out of the box with plotly and jupyter

Jupyter and plotly make it quite simple to create interactive plots with very little code using the object. I'm using plotly offline here, and since plotly is installed as one of the dependencies this should work right away, but you can replace it with the version of your choice. Using the online plotting library (if you have an account) makes your plots instantly shareable, and using ```plot``` instead of ```iplot``` exports the plots as an easily embeddable HTML plot.

In [None]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

In [None]:
iplot(janes.plot())

### 2.2 Custom plotting

The default options are set to a single-day window, and a 10-day moving average computed across the single-day data. Let's change some of the options. For example, say you wanted a weekly window, with 1, 4 and 52 week moving averages:

In [None]:
iplot(janes.plot(timeframe="W",smas=[1,4,52]))

There we go! The plotting is based on how stock market data is current plotted, as the next step for the library is to see if TA or Harmonic Analysis will provide some insights. Is an OHLC available? 

In [None]:
iplot(janes.plot(ohlc=True))

You bet! The OHLC colors can be further customized, and a lot more configuration is available. For details, consult the full documentation. For now, let's move on.

## 3. Filters

### 3.1 Multiple conversations

The same Conversation object can handle as many conversations as you'd like. We can do this by simply calling ```load``` multiple times. In fact, we can load every conversation I've ever had!

In [None]:
from glob import glob

In [None]:
moreconvos = Conversation()
for filename in tqdm(glob("*/message.json")):
    moreconvos.load(filename)

In [None]:
print len(moreconvos.messages)

That's a total of 847 conversations and 83,392 messages! We've seen how the sentiment analysis libraries can sometimes give unpredictable results before. In the library, we have two ways of combating this: averages and comparison. By taking averages of larger units of time we hope to remove some of the noise caused by incorrect prediction, and by comparing two conversations, we hope to see differences while assuming the same baseline of noise.

Let's look at the columns in the dataframe once again:

In [None]:
moreconvos.get_df().columns

There are two stored attributes that should help us differentiate between conversations. The first is ```tag```, which is the title of the conversation. The second is ```participants```, which is a json of the participants involved in the conversation. It is worth noting that the tag can be modified during the loading process, which is sometimes necessary as the conversations can be deleted or not contain valid titles.

### 3.2 Sets

The first part of looking at the conversation is looking at who was talking, and which conversations have been loaded. We can do this as follows (once again, I've masked most for anonymity):

In [None]:
moreconvos.get_names()

The ```get_names``` function returns a python set of all unique participants in the object. The ```get_tags``` function works similarly. A fuzzy search function is also provided, if you'd like to select similar names (or don't know exactly who you're searching for):

In [None]:
moreconvos.search_names("Jane Doe")

The ```search_names``` function returns all names as a list in decreasing likelihood of match. Inverted first and last names as well as family members can be found easier this way, especially if you talk to a lot of people (not as much of a problem for me).

### 3.3 Filters

Once we know what we're looking for, we can start narrowing the selection using filters. The following filters are provided:

1. filter_by_name(names)
2. filter_by_tag(tags)
3. filter_by_datetime(start, end)
4. filter_by_timestamp(start, end)
5. filter_by_sentiment(begin, end)

All the functions have an optional ```including``` parameter, that can be set to choose for a range outside or inside the selected parameters. The functions treat the object as immutable, and return a new object with the messages that fall within the filter.

For example, let's consider the weekly graph for my conversation from before:

In [None]:
iplot(janes.plot(timeframe="W",smas=[4,52]))

Let's say I want to focus in on my conversation with Jay for the period from July 14 to October 13, 2017. It looks like a significant chunk for some reason. To do this, I'd first select for my conversations with Jay, and then filter based on time:

In [None]:
from datetime import datetime

In [None]:
iplot(moreconvos \
    .filter_by_name("Jane Doe") \
    .filter_by_datetime(datetime(2017,7,14), datetime(2017,10,13)) \
    .plot(timeframe="W",smas=[4,52]))

This way, we can chain multiple filters to get the exact results we want, or store intermediate objects to speed things up.

### 3.4 Combining plots

Next, let's try comparing how my conversations went in that particular period by plotting Jay against everyone else. Since we're still returning plotly objects, combining plots is as simple as using the ```+``` operator:

In [None]:
jay_plot = moreconvos \
    .filter_by_name("Jane Doe") \
    .filter_by_datetime(datetime(2017,7,14), datetime(2017,10,13)) \
    .plot(timeframe="W",smas=[4],label="Jay")
    
everyone_else_plot = moreconvos \
    .filter_by_name("Jane Doe", including=False) \
    .filter_by_datetime(datetime(2017,7,14), datetime(2017,10,13)) \
    .plot(timeframe="W",smas=[4],label="Everyone Else")
    
iplot(jay_plot+everyone_else_plot)

This kind of analysis can be powerful in showing general trends across time against personal volatility. As we can see, Jay tends to run quite a bit happier than the total average - by a lot.

### 3.5 Annotations and Density plots

So far we've looked at charts of sentiment scores plotted on average. However, the astute will have noticed that it's quite hard to track down which messages contributed to which scores, and how this changes across time. In addition, since messages are usually sporadic, it would be helpful to see how many messages contributed to a particular average, since fewer messages can lead to a more volatile average score, even with consistent windows.

To this purpose, the plotting function has a density option:

In [None]:
iplot(moreconvos \
    .filter_by_name("Jane Doe") \
    .plot(timeframe="W",smas=[4],label="Jane Doe",density=True))

The density function provides useful information in telling us how frequently we were talking to each other, as well as providing some explanation for the more volatile values. The density plot is provided for each SMA being added to the graph.

In addition, there is the option to annotate each data point with information that would be valuable. For this purpose, two annotation functions are provided:

* ```annot_highlow``` annotates each average value with the current message as well as the highest and lowest scoring messages within that window.
* ```annot_current_with_subjectivity``` provides annotations that show the exact sentiment score of the current message as well as the subjectivity score.

The functions follow the same format (which can be found in the documentation), and allow for custom functions to be plugged in which provide other useful pieces of information. The two functions are demonstrated below (for purposes of anonymity, I've had to replace the actual plots with images):

In [None]:
iplot(moreconvos \
    .filter_by_name("Jane Doe") \
    .plot(timeframe="W",smas=[4],label="Jane Doe",annotation=janes.annot_highlow))

In [None]:
iplot(moreconvos \
    .filter_by_name("Jane Doe") \
    .plot(timeframe="W",smas=[4],label="Jane Doe",annotation=janes.annot_current_with_subjectivity))

That's it! Oh, and the ```get_stats``` function provides a JSON with all the basic information about the object.