-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ClearlyDefined Data Importing Functions #2
Conversation
We may have to install https://www.reviewnb.com/ to comment on |
This is tracked in these tickets - nexb/scancode-toolkit#2050 and nexb/scancode-toolkit#2057 |
@pombredanne @MaJuRG Any updates? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks alright, but I have a limited context as to what is going on here. Can you provide some simple instruction as to how to run this OR an example output file?
else: | ||
dataframe.to_hdf(path_or_buf=file_path, key=df_key, mode='w', format=h5_format) | ||
|
||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You won't this return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Removing This.
""" | ||
# Loads Dataframes | ||
path_json_dataframe = self.convert_records_to_json() | ||
print(path_json_dataframe['path'][0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this left accidentally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I'll remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AyanSinhaMahapatra we can remove the null returns from methods at different places. Otherwise, LGTM.
|
||
return dataframe | ||
|
||
def store_dataframe_to_hdf5(self, dataframe, file_name, df_key, h5_format=HDF5_STORE_FORMAT, is_append=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have this as a static method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
self.metadata_filename = 'projects_metadata.h5' | ||
self.hdf_dir = os.path.join(os.path.dirname(__file__), 'data/hdf5/') | ||
|
||
def get_hdf5_file_path(self, filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have this as a static method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return file_path | ||
|
||
# ToDo: Support Selective Query/Search | ||
def load_dataframe_from_hdf5(self, file_name, df_key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have this as a static method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
24bdc48
to
8d02524
Compare
Adds postgres.py to connect to and fetch data from postgres database. Adds load_results_file.py and load_results_package.py for loading scancode results data into MultiIndexed Pandas DataFrames. Adds .ipynb files for each file, to explain code and show the DataFrames. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
8d02524
to
a1926a6
Compare
Adds functions that takes input data from Scancode Scan Results, in JSON, and structures them similarly as the ClearlyDefined DataBase Data, so the same fuctions can be used on them. Adds Jupyter Notebook to explain the Fuction Calls, and Data, and JSON files as sample, from the issues in aboutcode-org/scancode-toolkit#1963. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
8ff26a6
to
87fca1a
Compare
Add docs to run the jupyter-notebooks locally and in google colab. Add all links to GSoC project in README, and the proposal. Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
87fca1a
to
00f7a50
Compare
* Add PEP 517/518 pyproject.toml file * Add setuptools_scm to handle versioning * Add setup.py content to setup.cfg * Update setup.py to act as a shim (so pip install -e works) Addresses: aboutcode-org#2 Signed-off-by: Steven Esser <sesser@nexb.com>
Adds postgres.py to connect to and fetch data from postgres database.
Adds load_results_file.py and load_results_package.py for loading scancode results
data into MultiIndexed Pandas DataFrames.
Adds .ipynb files for each file, to explain code and show the DataFrames.