-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to get data #49
Comments
OMG. That's probably the most overengineered piece of Python code I've ever seen. :D I wonder if it is autogenerated?.. The path to data is actually hardcoded here. OSCI/__app__/datalake/local/base.py Line 26 in f73e484
In your case it should be OSCI/__app__/datalake/local/base.py Lines 46 to 47 in f73e484
Unless There could be another explanation for this magic config override, and this is directly related to overengineered code is that it looks like somebody tried to apply singleton pattern to |
@RichardLitt these paths are automatically generated based on the config file. So you need to change your OSCI/__app__/config/files/default.yml Line 5 in f73e484
Change base_path: '/data' to base_path:'/Users/richard/src/OSCI/data' or something else whatever path you want.
@abitrolly this path not come from OSCI/__app__/datalake/local/base.py Line 26 in f73e484
It gets path from config |
@vlad-isayko so where is the code that does this? |
@vlad-isayko after the app is installed, it will not be able to look for |
@abitrolly , Lines 70 to 85 in 37535ea
Initiate local data lakes OSCI/__app__/datalake/datalake.py Lines 43 to 47 in 37535ea
Get config passed to the constructor OSCI/__app__/datalake/local/base.py Lines 25 to 33 in 37535ea
|
There is a file for this prod.yml It describes from which environment variables the values will be taken. The source of values is described through the value OSCI/__app__/config/files/prod.yml Lines 1 to 2 in 37535ea
So, for example, the value of OSCI/__app__/config/files/prod.yml Lines 7 to 9 in 37535ea
Secrets from databricks, which are transferred through the dbutils module (proprietary module for Spark clusters in the Databricks environment), can also act as a source of values. An example is found prod-cluster.yml |
Thanks for the clarifications. The configuration code raises many questions.
Then it would be worth documenting them at https://github.com/epam/OSCI#configuration |
This is really not helpful. Please, be respectful. People have worked really hard on this code, and it does some really important work. @vlad-isayko Thank you! Should I download data from somewhere, first? Is the data included in this repo? |
@RichardLitt so how would you say that the code is overengineered and ask if it is autogenerated? |
Depends on what date you want to get results for. All our YTD reports (that is, the data is counted from the beginning of the year to the required date, for example, for February 13, 2021, it is necessary to download and process data for all dates starting from January 1, 2021). So for each day, you need to sequentially run several commands: For example for January 1, 2021 # Load push events for 2021-01-01
python3 osci.py get-github-daily-push-events -d 2021-01-01
# Adds a company field for each commit and filters out those non-company commits
python3 osci.py process-github-daily-push-events -d 2021-01-01
# Highlights repositories that had company commits that day
python3 osci.py daily-active-repositories -d 2021-01-01
# Load info from Github API about repositories that had company commits that day
python3 osci.py load-repositories -d 2021-01-01
# Clears company commits from those commits that were sent to repositories without licenses
# We assumed that the availability of licenses is a factor of belonging to OpenSource (factor suggested by Red Hat https://www.redhat.com/en/topics/open-source/what-is-open-source-software#:~:text= Open% 20source% 20software% 20is% 20released, legally% 20available% 20to% 20end% 2Dusers.)
python3 osci.py filter-unlicensed -d 2021-01-01
# Builds OSCI Ranking and OSCI Commits Ranking reports for January 1, 2021
python3 osci.py daily-osci-rankings -td 2021-01-01 |
I don't currently have the data. How do I download it? Is that what you're referring to, above? Is there any way to get data from before 2021? |
@abitrolly Asking if something is overengineered and autogenerated could be seen as a value judgement, by you, of the quality of the code. Someone has worked hard at that code. Asking "Hey, I'm having trouble finding the relevant areas in the code" is much kinder, because it makes the issue about you and not about their code. I always assume that if there's something I can't understand, it's because I am missing some information - which means that we can work together to solve that problem for others. Claiming that code is confusing is putting the blame on the other party, which isn't a good way to start a conversation for the maintainer. Anyone responding will often be doing so on their own time, so it's kind to make sure that they want to help you. |
@RichardLitt while I agree with you, I am biased that this repository in not an open source project in a community sense, and all the work being done here is being paid by the outsourcing corporation that need this project for marketing purposes. Doesn't make me a good person to treat paid developers differently than free time maintainers, but at least they get compensated for their time. It is kind of a poor man's rant over the those who better off in a walled garden. Sorry about that. |
@vlad-isayko I'm sorry that the conversation has been derailed. I appreciate you and your work. Back to the issue at hand: I don't have any data locally. Where do I get it? Am I missing something? |
To clarify - I believe you tried to answer this question above, but the first command, |
Reread the docs. It's pretty clear I need to download data first from the GH Archive. I think that's what I was missing. Thanks, Vlad. |
I am trying to run the example commands, and there doesn't seem to be a /data folder, or the permissions for it are wrong. Am I missing something? Thank you.
Error:
The text was updated successfully, but these errors were encountered: