# Developing the Western Business Sentiment Dictionary - Report III

## 1. Introduction

In this occasion, in order to answer the question *"Do the media convey predominantly positive or negative messages about entrepreneurship, and if so, does that matter?"*, we analyzed the sentiment associated to business-related sentences extracted from the Financial Times for the period January 2003 to December 2014 that contained the keywords related to the business world previously used in the experiment with the New York Times. Upon completing the analysis, our conclusion is the same - *"Yes, the media convey positive messages."* 

This part of the project was performed in three main steps:

1. Fulltext access
2. Sentence extraction
3. Sentiment analysis

Firstly, FT provided us with the fulltext articles from 2003 to 2014 in XML. Secondly, we coded two scripts in Python - one to unify the format of all the articles across the different XML files and another one that searched for the articles that contained the business-related keywords. Finally, we analyzed the sentiment of the sentences - the classifier previously trained for the NYT (nearest centroid) assigned a (positive or negative) value to the sentences extracted from the articles.

This work was performed by the [CulturePlex Lab](http://www.cultureplex.ca/) with the contributions of Antonio Jiménez-Mavillard, Javier de la Rosa, Adriana Soto-Corominas, and Juan Luis Suárez. The code of this work can be found [here](https://github.com/mavillard/liwc).

## 2. Methodology

### a) Fulltext access

Unlike the experiment with the NYT, in which we accessed through its API the articles that contained our business-related keywords, we got the fulltext news dataset of the FT from 2003 to 2014 in XML files. These files were provided by the FT's staff, and came from two different sources, the newspaper and ft.com, with different format.

The first step was to parse the XML files. For each article, we extracted the following fields:
* title
* full text
* publication date
* unique id
* source
* url

### b) Sentence extraction

Next, we searched every article and collected all the sentences that contained any of the business-related terms. The result was ~half a million sentences organized in a data frame with the following columns:
* the sentence itself
* the id of the article where the sentence came from
* the publication date of the article
* the source of the article
* the title of the article
* the url of the article
* the search term
* the category of the term (business vocabulary, big company, or new company)

### c) Sentiment analysis

In this phase, we used the nearest centroid classifier obtained in the experiment with the NYT to analyze the sentiment of the whole sentence dataset. The sentiment polarity (positive or negative) was added to the data frame in a new column.

## 3. Results

The next table shows the total amount of sentences, grouped by term category, and the number of positive and negative sentences.

<table>
    <caption>General sentiment polarity by term category</caption>
    <thead>
        <th>category</th><th>negative</th><th>positive</th><td>% neg</td><td>% pos</td><th>total</th>
    </thead>
    <tbody>
        <tr><td>executive/manager</td><td>188155</td><td>221574</td><td>46%</td><td>54%</td><td>409729</td></tr>
        <tr><td>entrepreneur/founder</td><td>12703</td><td>35343</td><td>26%</td><td>74%</td><td>48046</td></tr>
        <tr><td>big companies</td><td>106342</td><td>156621</td><td>40%</td><td>60%</td><td>262963</td></tr>
        <tr><td>new companies</td><td>8780</td><td>19644</td><td>31%</td><td>69%</td><td>28424</td></tr>
    </tbody>
</table>

Figure 1 shows this result over the period 2003 to 2014.

<div align="center">
    <figure>
        <img src="ft/executive_manager_total.png" width="600"/>
        <img src="ft/entrepreneur_founder_total.png" width="600"/>
        <img src="ft/big_companies_total.png" width="600"/>
        <img src="ft/new_companies_total.png" width="600"/>
        <figcaption>Fig.1 Distribution of sentences by category over the period 2003 to 2014</figcaption>
    </figure>
</div>

Figure 2 shows their respective positive rate.

<div align="center">
    <figure>
        <img src="ft/categories_pos_rate.png" width="600"/>
        <figcaption>Fig.2 Positive rate of sentences by category over the period 2003 to 2014</figcaption>
    </figure>
</div>

Figure 3 shows the total amount of sentences, the number of positive, and negative sentences by profession.

<div align="center">
    <figure>
        <img src="ft/executive_total.png" width="600"/>
        <img src="ft/manager_total.png" width="600"/>
        <img src="ft/entrepreneur_total.png" width="600"/>
        <img src="ft/founder_total.png" width="600"/>
        <figcaption>Fig.3 Distribution of sentences by profession over the period 2003 to 2014</figcaption>
    </figure>
</div>

Figure 4 shows their respective positive rate.

<div align="center">
    <figure>
        <img src="ft/professions_pos_rate.png" width="600"/>
        <figcaption>Fig.4 Positive rate of sentences by profession over the period 2003 to 2014</figcaption>
    </figure>
</div>

Figure 5 shows the total amount of sentences, the number of positive, and negative sentences for some big companies.

<div align="center">
    <figure>
        <img src="ft/chevron_total.png" width="600"/>
        <img src="ft/ford_total.png" width="600"/>
        <img src="ft/microsoft_total.png" width="600"/>
        <img src="ft/mckinsey_total.png" width="600"/>
        <figcaption>Fig.5 Distribution of sentences for big companies over the period 2003 to 2014</figcaption>
    </figure>
</div>

Figure 6 shows their respective positive rate.

<div align="center">
    <figure>
        <img src="ft/some_big_companies_pos_rate.png" width="600"/>
        <figcaption>Fig.6 Positive rate of sentences for big companies over the period 2003 to 2014</figcaption>
    </figure>
</div>

Figure 7 shows the total amount of sentences, the number of positive, and negative sentences for some new companies.

<div align="center">
    <figure>
        <img src="ft/facebook_total.png" width="600"/>
        <img src="ft/instagram_total.png" width="600"/>
        <img src="ft/linkedin_total.png" width="600"/>
        <img src="ft/netflix_total.png" width="600"/>
        <img src="ft/spotify_total.png" width="600"/>
        <img src="ft/twitter_total.png" width="600"/>
        <figcaption>Fig.7 Distribution of sentences for new companies over the period 2003 to 2014</figcaption>
    </figure>
</div>

Figure 8 shows their respective positive rate.

<div align="center">
    <figure>
        <img src="ft/some_new_companies_pos_rate.png" width="600"/>
        <figcaption>Fig.8 Positive rate of sentences for new companies over the period 2004 to 2014</figcaption>
    </figure>
</div>

## 4. Conclusions

The previous figures show the amount of sentences that contain certain business-related keywords extracted from the FT. They also show their sentiment polarity, performed by a nearest centroid classifier with a high accuracy (71%).

The image 2 show that the sentiment associated to new companies is more positive than the associated to the old ones. The sentiment associated to professions like entrepreneur or founder are more positive than the associated to executive or manager. This can be seen in more detail in the image 4.

The images 6 and 8 reveal that old companies range from low (0.3) to high (0.7) values of positiveness, while new companies remain steady in high levels of positiveness, around 0.6-0.9.

Our conclusion is that the media convey a predominantly possitive sentiment about entrepreneurship.