Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update maven detections #3447

Merged
merged 6 commits into from
Jul 13, 2023
Merged

Update maven detections #3447

merged 6 commits into from
Jul 13, 2023

Conversation

AyanSinhaMahapatra
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra commented Jul 5, 2023

Update maven package and license detections:

  • Fix MANIFEST.MF parsing
  • Add package assembly for both pom.xml and MANIFEST.MF
  • implement functionality to do package specific license detection
  • run license detection on whole extracted license statement for maven
  • Add rules for license detection fixes
  • Fix detection for pom.properties file
  • Fix --todo plugin for package detection
  • Fix false positive case for bare word rules
  • Fix issue of unknown license detection in package manifest

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Looked for possible updates in documentation and added updates if applicable
  • Updated CHANGELOG.rst

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Add a new config variable for datafile handlers, which if enabled
will run a package-ecosystem specific implementation of license
detection. Added a function to implement this for maven which only keeps
the relevant information and runs license detection on the whole
license statement and not on the respective values/attributes
one-by-one. Enabled this for maven to improve license detection there.

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds a fix for maven packacge assembly, combining MANIFEST.MF and
pom.xml files into a single package, instead of multiple top level
packages. Also fixes pom.properties file parsing.

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Copy link
Contributor

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AyanSinhaMahapatra Looks good, I just highlighted some small issues.

src/packagedcode/jar_manifest.py Outdated Show resolved Hide resolved
src/packagedcode/jar_manifest.py Outdated Show resolved Hide resolved
src/packagedcode/jar_manifest.py Outdated Show resolved Hide resolved
@AyanSinhaMahapatra AyanSinhaMahapatra force-pushed the update-maven-detections branch 2 times, most recently from 503f7d8 to 0165bdf Compare July 12, 2023 11:14
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see some nits for your consideration

src/licensedcode/data/rules/json_17.RULE Outdated Show resolved Hide resolved
src/licensedcode/data/rules/lgpl-2.1_and_apache-2.0_1.RULE Outdated Show resolved Hide resolved
src/licensedcode/data/rules/lgpl-2.1_and_apache-2.0_2.RULE Outdated Show resolved Hide resolved
src/licensedcode/detection.py Outdated Show resolved Hide resolved
if comment or doc_url:
package["extra_data"] = {}
if comment:
package["extra_data"]['notes'] = comment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to find a better way than using extra_data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3447 (comment). We might want to add fields to the PackageData class then. And we might handle this separately, AFAIK this data is not really used anywhere and this is just a fix to make sure MANIFEST parsing returns everything else, and not crash.

@@ -278,7 +248,7 @@ def dget(s):
package['homepage_url'] = dget('Implementation-URL') or dget('Implementation-Url')

# Bundle-DocURL: http://logging.apache.org/log4j/1.2
package['documentation_url'] = dget('Bundle-DocURL')
doc_url = dget('Bundle-DocURL')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you are doing, but we are losing data here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are returning this package as-is to be used for creating a PackageData class, see https://github.com/nexB/scancode-toolkit/blob/update-maven-detections/src/packagedcode/maven.py#L138 with models.PackageData(**manifest,). We don't have a documentation_url or a notes field in PackageData, and this is why the Manifest parsing used to fail and not return anything. I'm just ensuring we return whatever we can, instead of crashing and not returning anything. And since I'm having this in extra_data, we're not really losing any data IMHO

return package


def parse_scm_connection(scm_connection):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TG1999 ping ... is this related or relevant to your purldb work?

if package_type in ['maven', 'jar']:
return JavaJarManifestHandler.datasource_id
elif package_type == 'osgi':
return JavaOSGiManifestHandler.datasource_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a maven package also has OSGi thing?s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can improve the parsing and handle more cases allright, but let's do this separately through another issue, and if you have an example of that case please provide the same. But this was also a fix, and I'm only making sure this works without crashing here.

src/packagedcode/licensing.py Outdated Show resolved Hide resolved
@AyanSinhaMahapatra
Copy link
Member Author

@pombredanne @JonoYang addressed all your comments, thanks!

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few extra nitrs that you can fix or not as you like before merging. Thanks!

src/packagedcode/licensing.py Outdated Show resolved Hide resolved
src/packagedcode/licensing.py Outdated Show resolved Hide resolved
Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
* Add rules for license detection fixes
* Fix detection for pom.properties file
* Fix --todo plugin for package detection
* Fix false positive case for bare word rules
* Fix issue of unknown license detection in package manifest

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Adds additional license rules to take care of maven
package license detection adjustments.

Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants