Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data collection #6

Closed
sayantikabanik opened this issue Dec 21, 2021 · 19 comments
Closed

Data collection #6

sayantikabanik opened this issue Dec 21, 2021 · 19 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@sayantikabanik
Copy link
Owner

sayantikabanik commented Dec 21, 2021

Sources

@nitinjethwani7
Copy link
Collaborator

@nitinjethwani7
Copy link
Collaborator

Raw Data:
Rawdata.xls

Sample Dataset Features:

Sample_DF.xlsx

@nitinjethwani7
Copy link
Collaborator

nitinjethwani7 commented Dec 27, 2021

Hackathon Gdrive Link:Farm yield forecasting

https://drive.google.com/drive/folders/17kI6TMFvd4ehfXCPjBNo6-zjC5BoB01J?usp=sharing

@sayantikabanik
Copy link
Owner Author

@sahithi02 references are here
@sayantikabanik go through the data

@sayantikabanik
Copy link
Owner Author

The excel version of sample data looks broad and a story behind can be proposed
while the hackathon one has masked location data for farms (which would be harder to present in evaluation)

@sayantikabanik
Copy link
Owner Author

sayantikabanik commented Dec 28, 2021

https://agmarknet.gov.in/

https://fcainfoweb.nic.in/Reports/Report_Menu_Web.aspx
(select the dates and a report is generated)
Daily Retail_Wholesale Report.pdf
@nitinjethwani7 @sahithi02 @anuraagbhavaraju take a look into this website and the data

@sayantikabanik
Copy link
Owner Author

http://www.nhb.gov.in/OnlineClient/MonthlyPriceAndArrivalReport.aspx

we have to parse the data or pick the json which APi is throwing
(we can pick a few products and set data collection frequency)
Screenshot 2021-12-28 at 7 10 32 PM

@anuraagbhavaraju
Copy link
Collaborator

The excel version of sample data looks broad and a story behind can be proposed while the hackathon one has masked location data for farms (which would be harder to present in evaluation)

@sayantikabanik by hackathon data do you mean Rawdata.xlsx?

@anuraagbhavaraju
Copy link
Collaborator

http://www.nhb.gov.in/OnlineClient/MonthlyPriceAndArrivalReport.aspx

we have to parse the data or pick the json which APi is throwing (we can pick a few products and set data collection frequency) Screenshot 2021-12-28 at 7 10 32 PM

@sayantikabanik this has an export to excel option also if that helps. But I am not sure why rates from this sources are weird. I checked a couple of vegetables, all the rates are in thousands

@sayantikabanik
Copy link
Owner Author

will be working with sample df from data.gov.in and another model based on individual commodity
@nitinjethwani7 will be sharing details on data collection allocation

@sayantikabanik sayantikabanik changed the title Data agriculture Data collection Dec 28, 2021
@sahithi02
Copy link
Collaborator

@nitinjethwani7
Copy link
Collaborator

nitinjethwani7 commented Dec 29, 2021

Sample DF: V1 (2 YEARS COMPLETE DATA)
Sample_DF.xlsx

@anuraagbhavaraju
@sahithi02
@kullayappagithub
@krishnasastry37
Please help me extract data in the above format for any 1 location each . For example ,you can download regional data for hyderabad/Gujarat(Region wise) and prepare the format accordingly as given in the sample DF file.

@sayantikabanik
Copy link
Owner Author

sayantikabanik commented Dec 30, 2021

Data collection discussion

  • summary dropping the idea for location-based data for now
  • vlookup or any other process from raw data to get the required data

2014-2016 @sahithi02
2017-2019 @anuraagbhavaraju
2021 + 2013 @nitinjethwani7

@sayantikabanik
Copy link
Owner Author

Raw data - #6 (comment)

@sayantikabanik
Copy link
Owner Author

sayantikabanik commented Dec 30, 2021

https://github.com/sayantikabanik/FP2/tree/main/forecasting_framework/data
(add all the datasets here, I will be transferring it to cloud soon )

Please keep in mind that we will be using API based approach in building out data pipelines so coding in python including data cleaning etc is helpful in long run

@anuraagbhavaraju
Copy link
Collaborator

anuraagbhavaraju commented Dec 30, 2021

@nitinjethwani7 @sayantikabanik @sahithi02 I uploaded a first version of Rawdata.xls converted into the format that we discussed this morning in the below link with name 'FPData v1.0.xlsx'. I created a short python script to do this job

https://github.com/sayantikabanik/FP2/tree/main/forecasting_framework/data

@nitinjethwani7 I did it for the entire file (all time periods), please have a look and let me know if it needs any modifications

@nitinjethwani7
Copy link
Collaborator

@anuraagbhavaraju -Please help me integrate rainfall data in the same file which you added.Here is the rainfall data till 2015
Rainfall data till 2015.xls
!

@anuraagbhavaraju
Copy link
Collaborator

@anuraagbhavaraju -Please help me integrate rainfall data in the same file which you added.Here is the rainfall data till 2015 Rainfall data till 2015.xls !

Hi Nitin, this is done. I uploaded FPData v1.1.xlsx

@sayantikabanik
Copy link
Owner Author

#Need the logic you have used for the extraction from Raw format
Also please add one block of comment with this #info:

  • Final raw sources used
  • Logic used
  • Any potential changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants