Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ClearlyDefined Data Importing Functions #2

Merged
merged 3 commits into from
Jul 2, 2020

Conversation

AyanSinhaMahapatra
Copy link
Collaborator

Adds postgres.py to connect to and fetch data from postgres database.
Adds load_results_file.py and load_results_package.py for loading scancode results
data into MultiIndexed Pandas DataFrames.
Adds .ipynb files for each file, to explain code and show the DataFrames.

@AyanSinhaMahapatra
Copy link
Collaborator Author

AyanSinhaMahapatra commented Jun 16, 2020

We may have to install https://www.reviewnb.com/ to comment on .ipynb diffs on cells.
Looks like this

@AyanSinhaMahapatra
Copy link
Collaborator Author

This is tracked in these tickets - nexb/scancode-toolkit#2050 and nexb/scancode-toolkit#2057

@AyanSinhaMahapatra
Copy link
Collaborator Author

@pombredanne @MaJuRG Any updates?

Copy link
Contributor

@steven-esser steven-esser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks alright, but I have a limited context as to what is going on here. Can you provide some simple instruction as to how to run this OR an example output file?

else:
dataframe.to_hdf(path_or_buf=file_path, key=df_key, mode='w', format=h5_format)

return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't this return.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Removing This.

"""
# Loads Dataframes
path_json_dataframe = self.convert_records_to_json()
print(path_json_dataframe['path'][0])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this left accidentally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I'll remove this.

Copy link

@arnav-mandal1234 arnav-mandal1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AyanSinhaMahapatra we can remove the null returns from methods at different places. Otherwise, LGTM.


return dataframe

def store_dataframe_to_hdf5(self, dataframe, file_name, df_key, h5_format=HDF5_STORE_FORMAT, is_append=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have this as a static method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.metadata_filename = 'projects_metadata.h5'
self.hdf_dir = os.path.join(os.path.dirname(__file__), 'data/hdf5/')

def get_hdf5_file_path(self, filename):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have this as a static method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return file_path

# ToDo: Support Selective Query/Search
def load_dataframe_from_hdf5(self, file_name, df_key):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have this as a static method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Adds postgres.py to connect to and fetch data from postgres database.
Adds load_results_file.py and load_results_package.py for loading scancode results
data into MultiIndexed Pandas DataFrames.
Adds .ipynb files for each file, to explain code and show the DataFrames.

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds functions that takes input data from Scancode Scan Results, in JSON, and
structures them similarly as the ClearlyDefined DataBase Data, so the same
fuctions can be used on them. Adds Jupyter Notebook to explain the Fuction Calls,
and Data, and JSON files as sample, from the issues in aboutcode-org/scancode-toolkit#1963.

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Add docs to run the jupyter-notebooks locally and in google colab.
Add all links to GSoC project in README, and the proposal.

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
@steven-esser steven-esser merged commit a5da2fa into aboutcode-org:master Jul 2, 2020
AyanSinhaMahapatra pushed a commit to AyanSinhaMahapatra/scancode-analyzer that referenced this pull request Nov 5, 2020
* Add PEP 517/518 pyproject.toml file
* Add setuptools_scm to handle versioning
* Add setup.py content to setup.cfg
* Update setup.py to act as a shim (so pip install -e works)

Addresses: aboutcode-org#2

Signed-off-by: Steven Esser <sesser@nexb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants