Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues of the daily datasets #20

Closed
halccw opened this issue Apr 28, 2014 · 4 comments
Closed

Some issues of the daily datasets #20

halccw opened this issue Apr 28, 2014 · 4 comments

Comments

@halccw
Copy link
Collaborator

halccw commented Apr 28, 2014

Record some issues to be solved.

  1. Missing cities
    From the current 7 daily datasets (Rice, Wheat, Onion...), There are1308 cities(or towns or markets) in daily datasets that are not covered by regions.csv. I have not found an efficient way to solve it. The whole list please check: https://github.com/fabbrix/humanitas/blob/master/data/india/csv_daily/agmarknet.nic.in/missing_cities_daily.csv
  2. Duplicate dates and abnormal spikes problem

Rice

figure_1

figure_2

@mstefanro
Copy link
Collaborator

Duplicate dates exist for daily data as well?
On 04/28/2014 07:05 PM, chingchia wrote:

Record some issues to be solved.

Missing cities
From the 7 current daily datasets (Rice, Wheat, Onion...), There
are1308 cities in daily datasets that are not covered by
regions.csv. I have not found an efficient way to solve it. The
whole list please check:
https://github.com/fabbrix/humanitas/blob/master/data/india/csv_daily/agmarknet.nic.in/missing_cities_daily.csv
Duplicate dates and abnormal spikes problem

Rice

figure_1
https://cloud.githubusercontent.com/assets/4166714/2820142/af3b8d78-cef6-11e3-8ff8-c916a7bd3eb1.png

figure_2
https://cloud.githubusercontent.com/assets/4166714/2820141/af3b28c4-cef6-11e3-9e7c-fbfb7ee159aa.png


Reply to this email directly or view it on GitHub
#20.

@halccw
Copy link
Collaborator Author

halccw commented Apr 29, 2014

Yes, similar to the weekly ones.

The following is from daily Rice:

https://github.com/fabbrix/humanitas/blob/master/analysis/preproc/dup_daily.txt

@tonyo
Copy link
Collaborator

tonyo commented Apr 29, 2014

Duplicates arise from weird tabular data.
See, for example, http://agmarknet.nic.in/cmm2_home.asp?comm=Rice&dt=28/01/2010, for Gajapathinagaram. There are two rows with empty subproducts, but the data is duplicated from previous rows.
I made some additional checks, and it looks like all rows with missing subproduct are redundant.
@ChingChia Could you please try to ignore products with empty subproduct field and see what happens?

@halccw
Copy link
Collaborator Author

halccw commented Apr 29, 2014

This is the duplication-record of excluding empty subproduct: (daily Rice)

https://github.com/fabbrix/humanitas/blob/master/analysis/preproc/dup_daily_rice_exclude_empty_subproduct.txt

The result seems nice, 2 identical dates to 1 identical price. I can eliminate them by taking of one of the duplicated 2-dates. (taking the non-zero tonnes one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants