Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify apache_kafka.py and related tests for migration #1042

Merged
merged 10 commits into from
Feb 10, 2023

Conversation

johnmhoran
Copy link
Member

No description provided.

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Please see feedback mostly to cleanup the code from comments so we can merge.

vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/tests/test_apache_kafka.py Outdated Show resolved Hide resolved
@johnmhoran johnmhoran force-pushed the 972-migrate-apache-kafka-importer branch from a5f7248 to bc141b6 Compare January 3, 2023 22:24
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
@TG1999 TG1999 added this to the v32.0.0 milestone Jan 17, 2023
@TG1999
Copy link
Member

TG1999 commented Jan 23, 2023

@johnmhoran please rebase your branch and also provide importer improver logs for this importer.

@johnmhoran
Copy link
Member Author

@TG1999 After prep steps I ran into an error trying to run the apache_kafka importer.

(venv) Mon Jan 23, 2023 05:22 PM  /home/jmh/dev/nexb/vulnerablecode jmh (972-migrate-apache-kafka-importer)
$ ./manage.py import vulnerabilities.importers.apache_kafka.ApacheKafkaImporter
Importing data using vulnerabilities.importers.apache_kafka.ApacheKafkaImporter
Traceback (most recent call last):
  File "/home/jmh/dev/nexb/vulnerablecode/vulnerabilities/management/commands/import.py", line 60, in import_data
    ImportRunner(importer).run()
  File "/home/jmh/dev/nexb/vulnerablecode/vulnerabilities/import_runner.py", line 43, in run
    advisory_datas = importer_class().advisory_data()
  File "/home/jmh/dev/nexb/vulnerablecode/vulnerabilities/importer.py", line 321, in advisory_data
    raise NotImplementedError
NotImplementedError
Failed to run importer vulnerabilities.importers.apache_kafka.ApacheKafkaImporter. Continuing...
CommandError: 1 failed!: vulnerabilities.importers.apache_kafka.ApacheKafkaImporter

(venv) Mon Jan 23, 2023 05:23 PM  /home/jmh/dev/nexb/vulnerablecode jmh (972-migrate-apache-kafka-importer)
$

I don't understand why the raise NotImplementedError error was raised, since we do return AdvisoryData() objects -- perhaps this is related to our use of hard-coded values in lieu of code?

@johnmhoran
Copy link
Member Author

@TG1999 I've been unable to track down the source of the error. All tests pass. Perhaps we can discuss tomorrow?

@johnmhoran johnmhoran force-pushed the 972-migrate-apache-kafka-importer branch from 639979b to e016650 Compare January 24, 2023 21:05
@johnmhoran
Copy link
Member Author

@TG1999 All 10 GH checks have passed and both the apache_kafka importer and the default improver ran successfully (see attached file).
vcio-apache_kafka-import-and-improve-2023-01-24.txt

Please note, however, that the improver success message included a warning:

(venv) Tue Jan 24, 2023 02:09 PM  /home/jmh/dev/nexb/vulnerablecode jmh (972-migrate-apache-kafka-importer)
$ ./manage.py improve vulnerabilities.improvers.default.DefaultImprover
Improving data using vulnerabilities.improvers.default.DefaultImprover
Failed to get exact purls for AffectedPackage(package=PackageURL(type='apache', namespace=None, name='kafka', version=None, qualifiers={}, subpath=None), affected_version_range=ApacheVersionRange(constraints=(VersionConstraint(comparator='>=', version=SemverVersion(string='0.11.0.0')), VersionConstraint(comparator='<=', version=SemverVersion(string='2.1.0')), VersionConstraint(comparator='<', version=SemverVersion(string='2.1.1')))), fixed_version=None) Invalid constraints sequence: [VersionConstraint(comparator='>=', version=SemverVersion(string='0.11.0.0')), VersionConstraint(comparator='<=', version=SemverVersion(string='2.1.0')), VersionConstraint(comparator='<', version=SemverVersion(string='2.1.1'))]
Successfully improved data using vulnerabilities.improvers.default.DefaultImprover

(venv) Tue Jan 24, 2023 02:15 PM  /home/jmh/dev/nexb/vulnerablecode jmh (972-migrate-apache-kafka-importer)
$

@johnmhoran johnmhoran force-pushed the 972-migrate-apache-kafka-importer branch from 1fb0eef to 0b26584 Compare January 25, 2023 18:19
Copy link
Member

@TG1999 TG1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pombredanne pombredanne self-assigned this Jan 26, 2023
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you++
See my comments for your consideration.

"affected_packages": [
{
"package": {
"type": "maven",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as an "apache_kafka" maven package. This is a 404: https://repo1.maven.org/maven2/apache_kafka

There are instead two possible package URLs to return:
1. pkg:apache/kafka (let's use this ONLY for now)
2. and many different Maven PURLs because there are many JARS in kafka. That's a job for an improver later.

To get an idea of what maven PURls would be, in this https://downloads.apache.org/kafka/3.3.2/kafka_2.12-3.3.2.tgz we have all these JARS:

  • kafka_2.13-3.3.2.jar
  • kafka-clients-3.3.2.jar
  • kafka-log4j-appender-3.3.2.jar
  • kafka-metadata-3.3.2.jar
  • kafka-raft-3.3.2.jar
  • kafka-server-common-3.3.2.jar
  • kafka-shell-3.3.2.jar
  • kafka-storage-3.3.2.jar
  • kafka-storage-api-3.3.2.jar
  • kafka-streams-3.3.2.jar
  • kafka-streams-examples-3.3.2.jar
  • kafka-streams-scala_2.13-3.3.2.jar
  • kafka-streams-test-utils-3.3.2.jar
  • kafka-tools-3.3.2.jar

That's only for a kafka build for Scale 2.12.
There is another for 2.13 with a second set of similar JARs.

All these 14 JARs would exist on Maven. For instance with version 3.3.2, we would have:

THEREFORE, let's use only a series of pkg:apache/kafka PURLs for now.
@johnmhoran please create an issue to later create an improver that will handle the maven creation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pombredanne for your detailed comments. 👍 This test file was an earlier draft and I should have deleted it, doing so now; and I've opened a new issue re a maven importer as requested.

"qualifiers": null,
"subpath": null
},
"affected_version_range": "vers:maven/2.0.0|2.0.1|2.1.0|2.1.1|2.2.0|2.2.1|2.2.2|2.3.0|2.3.1|2.4.0|2.4.1|2.5.0|2.5.1|2.6.0|2.6.1|2.6.2|!=2.6.3|2.7.0|2.7.1|!=2.7.2|2.8.0.|!=2.8.1|<3.0.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the range is using maven syntax for all their versions? including some older ones as found in http://archive.apache.org/dist/kafka/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne I don't understand your question, but I think this is superseded by your decision above to use only apache PURLs for now and to add an issue for a maven improver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
vulnerabilities/importers/apache_kafka.py Outdated Show resolved Hide resolved
fixed_versions_string = fixed_versions_row.find_all("td")[1].text

# Remove leading white space after initial comma
affected_versions_string_split_SPLIT = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very variable name.... Consider this code instead:

            affected_versions = [v.strip() for v in affected_versions.split(",")]
            affected_versions = [v for v in affected_versions if v]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne After making your suggested changes I get an error re my hard-coded mapping:

TypeError: unhashable type: 'list'

Replacing the awkwardly-named

            affected_versions_string_split_SPLIT = [
                substring.strip()
                for substring in affected_versions.split(",")
                if not substring.isspace()
            ]

with

            affected_versions = [v.strip() for v in affected_versions.split(",")]
            affected_versions = [v for v in affected_versions if v]

has converted the previously-defined affected_versions = affected_versions_row.find_all("td")[1].text to a list, which won't work as a mapping/dict key. There's probably a simple fix for this, giving it some thought now....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne This is where the conversion of affected_versions to a list throws an error. It also applies to your subsequent comment re fixed_versions_string_split_SPLIT.

            # This throws a KeyError if the opening h2 tag `id` data changes or is not in the
            # hard-coded affected_version_range_mapping dictionary.
            if affected_version_range_mapping[cve_id]["action"] == "include":

                # These 2 variables (not used elsewhere) trigger the KeyError for changed/missing data.
                check_affected_versions_key = affected_version_range_mapping[cve_id][
                    affected_versions
                ]
                check_fixed_versions_key = affected_version_range_mapping[cve_id][
                    fixed_versions_string
                ]

I don't see how to modify the excerpted code to deal with your conversion of affected_versions to a list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, by simply changing the variable name ever so slightly to avoid using affected_versions, your approach works. Am I missing something? ;-) This way we use your cleaner code AND avoid converting affected_versions itself to an unwanted list.

            affected_versions_clean = [v.strip() for v in affected_versions.split(",")]
            affected_versions_clean = [v for v in affected_versions if v]

for substring in affected_versions_string.split(",")
if not substring.isspace()
]
fixed_versions_string_split_SPLIT = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My above comment seems to apply with equal vigor to your suggested fixed_versions_string_split_SPLIT improvement. All tests once again pass.

@johnmhoran
Copy link
Member Author

@pombredanne I've addressed all of your PR comments (thank you ;-), committed and pushed my updated code, and all GH checks have passed. Ready for you to review at your convenience.

Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
…he_kafka #972

Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Reference: #972

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
@johnmhoran johnmhoran force-pushed the 972-migrate-apache-kafka-importer branch from b973738 to e33e18d Compare February 8, 2023 20:57
@johnmhoran
Copy link
Member Author

@pombredanne I've replaced the rather long mapping reference with a variable as you suggested and all tests and GH checks pass. 👍


# This throws a KeyError if the opening h2 tag `id` data changes or is not in the
# hard-coded affected_version_range_mapping dictionary.
cve_version_mapping = affected_version_range_mapping[cve_id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I would prefer not having a _mapping suffix or similar type suffixes when this is not absolutely needed, but this is cosmetic we can fix this later!

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pombredanne pombredanne merged commit 0c04da4 into main Feb 10, 2023
@pombredanne pombredanne deleted the 972-migrate-apache-kafka-importer branch February 10, 2023 10:25
@TG1999 TG1999 mentioned this pull request Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants