Skip to content

Commit

Permalink
Final changes in documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ohduran committed Jul 21, 2017
1 parent 4b99e18 commit b1c4728
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 13 deletions.
19 changes: 9 additions & 10 deletions docs/discussion.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
## Project discussion
### Index
1. Code discussion
2. Problem A discussion
2.1. Data quality discussion
2.2. Data analysis discussion
3. Problem B discussion
3.1. Data quality discussion
2. Data quality discussion
3. Problem A discussion
3.2. Data analysis discussion
3.3. Solution schema
4. Problem B discussion
4.1 Data analysis discussion
4.2 Solution schema

### Code:

Expand All @@ -16,16 +17,14 @@
- The structure of the data is inconsistent: most doesn't have "platform_name". For consistency, we extracted the name from the url using Regular Expressions.
- Time intervals, not time points: hard to discuss trending on data intervals instead of data points.
- The way to handle dates is problematic here; it was decided to use the start_date as the point of reference, based on the idea that end_time is arbitrary selected by the campaign manager, but the start_date isn't.
### Data Quality
Attach dates to certain concepts isn't enough: concepts tend to repeat themselves throughout the same campaign. In problem A, we filter that by only adding a new data point for each campaign. In problem B, we distribute evenly the money raised along the duration of the campaign.
### Problem A:
#### Quality of the data
- Attach dates to certain concepts isn't enough: concepts tend to repeat themselves throughout the same campaign. We filter that by only adding a new data point for each campaign.
#### Analysis of the data
Interestingly enough, the assessment suggests counting the number of times a concept happens on a given time, regardless of whether the occurrences were at the same campaign or across different campaigns. Although it would account for a more granularity in terms of how many times someone, somewhere, used that word on a campaign, is oblivious to the fact that, if a given campaign approaches the description of the product by repeating over an over the same term, that doesn't mean that the term is any more trending than others.

That is, if I created a campaign in which I constantly go over the fact that I want to open my own coffee shop, and go over different varieties of coffee into too much detail, that won't make it any more trending than someone that decided to call their campaign "Zuckerberg 2020" and never mention the name of the candidate anymore on the description.

Thus, when counting the occurrences of a given concept, we weren't oblivious to this issue and decided to count each concept just once. If we were to discuss how much frequent a word is correlated with the success or failure of a given campaign, that would be a different issue that I believe is out of the scope of this assessment. In any case, the occurrences have been reported within the campaign anyway (after all, it is information provided, thus increasing reusability of this project).
### Problem B
#### Quality of the data
- Again, the way to handle dates is problematic here; where to select the dates, given the fact that the end_time is arbitrary selected. In this case, to account for the fact that just using start_date might lead to enormous and time-extensive campaigns corrupting the index, it was decided to split the money raised by a certain campaign evenly on all of the days that the campaign was open, for lack of more data and safely assuming that the distribution of the raising of the money happens close to that approach, when aggregating all the campaigns.
#### Analysis of the data
Again, the way to handle dates is problematic here; where to select the dates, given the fact that the end_time is arbitrary selected. In this case, to account for the fact that just using start_date might lead to enormous and time-extensive campaigns corrupting the index, it was decided to split the money raised by a certain campaign evenly on all of the days that the campaign was open, for lack of more data and safely assuming that the distribution of the raising of the money happens close to that approach, when aggregating all the campaigns.
10 changes: 7 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,13 @@ This is a curated document that include information on how to run this program o

## Index
1. [Structure](structure.md)

Detailed explanation of how the project is arranged.

2. [Discussion](discussion.md)

Further discussion on the matter of the assessment, the code itself and how the implementation was addressed.

3. [Quick Start](quickstart.md)

How to run the program.
10 changes: 10 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Quick Start Guide
Go to [run.py](run.py), either on the command line or using any Python IDE.

(If ran in Terminal, ensure run.py is recognised as executable using the following command):

```
$ chmod a+x run.py
```

Simply run run.py (and wait).

0 comments on commit b1c4728

Please sign in to comment.