Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Requested: Oracle SQL #2755

Closed
jeremylong opened this issue Aug 18, 2020 · 17 comments
Closed

Help Requested: Oracle SQL #2755

jeremylong opened this issue Aug 18, 2020 · 17 comments

Comments

@jeremylong
Copy link
Owner

Significant performance improvements have been made for MySQL in the 6.x branch and we are hoping to get someone to submit a PR to update the initialize_oracle.sql file to support the changes required for 6.0.0. The changes made for MySQL can be seen here: 6a6d21b#diff-23975a7378873c27620d7ccee40b72b8

It would be very helpful if someone could create a PR to perform the medications for:
https://github.com/jeremylong/DependencyCheck/blob/addCvssVersions/core/src/main/resources/data/initialize_oracle.sql

Note that the performance enhancements for MySQL are currently in the addCvssVersions branch (https://github.com/jeremylong/DependencyCheck/tree/addCvssVersions) as the performance enhancement changes are combined with other breaking changes to enhance the CVSS meta data. If you are testing the stored procedures you should use the addCvssVersions branch.

@aikebah
Copy link
Collaborator

aikebah commented Oct 4, 2020

@jeremylong I'll try and see how far I can get on this with the help of a Dockerized Oracle XE 18 database

@aikebah
Copy link
Collaborator

aikebah commented Oct 8, 2020

Have an apparently working schema, but not yet satisfied by its (lack of) speed.

On my containerized Oracle XE (18.4.0) initializing the DB from scratch with the NVD data takes 3.536.820ms - almost an hour.

Can anyone tell what the 5.3.2 performance used to be for an Oracle initialization?

@aikebah
Copy link
Collaborator

aikebah commented Oct 8, 2020

Also needed some dirty hacks in CveDB (oracle specific codepaths) to get it running with my schema
The preliminary steps can be found in my repo under branch issue-2755. That branch should be considered a private branch not to be used to build upon - I reserve all rights to rebase-at-will and rewrite its history, just want to give others the chance to comment and suggest better approaches if they spot them.

@jeremylong
Copy link
Owner Author

The performance on the update for any of the external databases is not great. I can't believe they were used before 6.x as we've had reports of updates taking 5-24 hours. One hour update time isn't horrible considering what it was before. I think people just had one node setup to update the external DB and after the initial load if they updated daily/weekly it would go much faster.

@jeremylong
Copy link
Owner Author

To avoid some of the oracle specific code - instead of using an OUT parameter in update_vulnerability I think you could just return a SELECT vulnerabilityId? Something like:

CREATE OR REPLACE PROCEDURE update_vulnerability(p_cveId IN vulnerability.cve%type,
                                      ...
                                      cursor_ OUT SYS_REFCURSOR)
                                       AS
vulnerabilityId number;
BEGIN
...
OPEN cursor_ FOR
      SELECT vulnerabilityId

I know that isn't 100% right - but with some tweaking might work. Thoughts?

@aikebah
Copy link
Collaborator

aikebah commented Oct 9, 2020

I tried returning, but then Oracle needs a function - stored procedures in Oracle cannot return other then via an out parameter.... and the function also requires similar call structure from what I found (tried the function approach first

will keep trying to find a better way, but first need to find out why currently no CPE gets registered while update-only runs successfully

@aikebah
Copy link
Collaborator

aikebah commented Oct 14, 2020

Managed to get cleaner code by using stored-procedure with OUT parameter. Still need some Oracle custom code there (additional parameter and need to use CallableStatement instead of prepared statement because of the OUT parameter), but in a much less ugly way. Currently testing my solution after validating curious to see what the timing of the initial load is going to be now that I have properly restored the creation of cpeEntry and software in the insert_software Stored procedure.

@aikebah
Copy link
Collaborator

aikebah commented Oct 14, 2020

Results of my load test:

[INFO] Processing Complete for NVD CVE - 2003  (399089 ms)
[INFO] Processing Started for NVD CVE - 2010
[INFO] Processing Complete for NVD CVE - 2004  (818222 ms)
[INFO] Processing Started for NVD CVE - 2011
[INFO] Processing Complete for NVD CVE - 2005  (1728594 ms)
[INFO] Processing Started for NVD CVE - 2012
[INFO] Processing Complete for NVD CVE - 2009  (1879614 ms)
[INFO] Processing Started for NVD CVE - 2013
[INFO] Processing Complete for NVD CVE - 2002  (2188659 ms)
[INFO] Processing Started for NVD CVE - 2014
[INFO] Processing Complete for NVD CVE - 2010  (2277047 ms)
[INFO] Processing Started for NVD CVE - 2015
[INFO] Processing Complete for NVD CVE - 2007  (2920062 ms)
[INFO] Processing Started for NVD CVE - 2016
[INFO] Processing Complete for NVD CVE - 2011  (2264389 ms)
[INFO] Processing Started for NVD CVE - 2017
[INFO] Processing Complete for NVD CVE - 2006  (3192293 ms)
[INFO] Processing Started for NVD CVE - 2018
[INFO] Processing Complete for NVD CVE - 2008  (3243295 ms)
[INFO] Processing Started for NVD CVE - 2019
[INFO] Processing Complete for NVD CVE - 2012  (2490269 ms)
[INFO] Processing Started for NVD CVE - 2020
[INFO] Processing Complete for NVD CVE - 2013  (2510618 ms)
[INFO] Processing Complete for NVD CVE - 2015  (2271651 ms)
[INFO] Processing Complete for NVD CVE - 2014  (2772401 ms)
[INFO] Processing Complete for NVD CVE - 2016  (2200262 ms)
[INFO] Processing Complete for NVD CVE - 2020  (1383692 ms)
[INFO] Processing Complete for NVD CVE - 2017  (2534081 ms)
[INFO] Processing Complete for NVD CVE - 2019  (2420155 ms)
[INFO] Processing Complete for NVD CVE - 2018  (2488306 ms)
[INFO] Download Started for NVD CVE - Modified
[INFO] Download Complete for NVD CVE - Modified  (1156 ms)
[INFO] Processing Started for NVD CVE - Modified
[INFO] Processing Complete for NVD CVE - Modified  (17250 ms)
[INFO] Begin database maintenance
[INFO] Updated the CPE ecosystem on 1 NVD records
[INFO] Removed the CPE ecosystem on 1 NVD records
[INFO] Cleaned up 1 orphaned NVD records
[INFO] End database maintenance (53373 ms)
[INFO] Skipping RetireJS update since last update was within 24 hours.
[INFO] Check for updates complete (5771559 ms)
....
[INFO] Total time:  01:36 h

record-stats of the database:

cpeEcosystemCache	16511
cpeEntry	205164
cweEntry	116864
properties	23
reference	575449
software	1005817
vulnerability	115426

@aikebah
Copy link
Collaborator

aikebah commented Oct 14, 2020

@jeremylong Any recommendations on how to best validate this Oracle database schema operates correctly?

@jeremylong
Copy link
Owner Author

I'm a little suspicious of:

[INFO] Begin database maintenance
[INFO] Updated the CPE ecosystem on 1 NVD records
[INFO] Removed the CPE ecosystem on 1 NVD records
[INFO] Cleaned up 1 orphaned NVD records

Although - that could be a difference in how the "counts" are retrieved on oracle vs. other DBs. for instance, on an initial load of the data there should not be any orphaned NVD records.

The testing I've done is compare executions using H2 in comparison to MySQL/Postgresql/etc.

@aikebah
Copy link
Collaborator

aikebah commented Oct 17, 2020

@jeremylong The count of 1 is indeed due to different Oracle behaviour. Getting the update count would require adding an out-parameter to the stored procedure(s). So would require an additional oracle-special branch (in the body of cleanupDatabase) similar to the logic already added to updateOrInsertVulnerability for the insert_software stored procudure.

Would be easy to integrate. Do you consider having the proper counts more beneficial than having a single code-path? Would be happy to adapt my code for it if so.

Oracle code and schema still need some further debugging, as I managed to get logical duplicates into cpeEntry in the Oracle database (all fields apart from ID identical) - raising an error for multiple results instead of one during update for some integration tests of the maven plugin when I ran a build modified to use my oracle database:

Caused by: java.sql.BatchUpdateException: ORA-01422: exact fetch returns more than requested number of rows
ORA-06512: at "DCUSER.INSERT_SOFTWARE", line 42
ORA-06512: at "DCUSER.INSERT_SOFTWARE", line 23

A quick query on the database shows that I managed to create 19 duplicate CPEs.

@aikebah
Copy link
Collaborator

aikebah commented Oct 17, 2020

Looks like a side-effect of the parallel test execution. I think some kind of in-database lock-for-updating at the start of the update is required for the 'remote' database instances (all but the in-VM h2 database) to avoid 2 updates on separate JVMs from trying to mutate the database in parallel.

Nevertheless added the 'UNIQUE' constraint to the index for all functional fields of cpeEntry to at least avoid this from ever happening again - will run a new IT test later to verify that I indeed see some testcases fail on attempts to add a duplicate cpeEntry.

@jeremylong
Copy link
Owner Author

Regarding the counts being one for the three INFO statements that are updating the ecosystems and purging orphans. I might have a solution - instead of using stored procs such as:

https://github.com/aikebah/DependencyCheck/blob/d9f49695880eb3438ffdba15b2d97a6ef0f995b2/core/src/main/resources/data/initialize_oracle.sql#L189-L192

and

https://github.com/aikebah/DependencyCheck/blob/d9f49695880eb3438ffdba15b2d97a6ef0f995b2/core/src/main/resources/data/initialize_oracle.sql#L401-L417

Because these are single line stored procs the SQL could be moved to the dbStatements_oracle.properties under the appropriate keys.

@aikebah
Copy link
Collaborator

aikebah commented Oct 18, 2020

Right on that one.... overlooked the opportunity. Updated the queries and the initialization script. Currently running a new full DB load to verify results.

@aikebah
Copy link
Collaborator

aikebah commented Oct 18, 2020

Database initialization verified against a fresh Postgresql database.
Initial load of Oracle DB (running in a DockerContainer running on a VM on the same host that DependencyCheck runs on) takes approximately one hour.

[INFO] Processing Complete for NVD CVE - 2020  (842678 ms)
[INFO] Processing Complete for NVD CVE - 2017  (1544558 ms)
[INFO] Processing Complete for NVD CVE - 2019  (1545254 ms)
[INFO] Processing Complete for NVD CVE - 2018  (1562712 ms)
[INFO] Download Started for NVD CVE - Modified
[INFO] Download Complete for NVD CVE - Modified  (2248 ms)
[INFO] Processing Started for NVD CVE - Modified
[INFO] Processing Complete for NVD CVE - Modified  (13533 ms)
[INFO] Begin database maintenance
[INFO] Updated the CPE ecosystem on 7647 NVD records
[INFO] Removed the CPE ecosystem on 3769 NVD records
[INFO] End database maintenance (39828 ms)
[INFO] Skipping RetireJS update since last update was within 24 hours.
[INFO] Check for updates complete (3575667 ms)
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  59:38 min

@aikebah
Copy link
Collaborator

aikebah commented Oct 18, 2020

Analysis run for the Oracle DB is significantly slower (2.5-3 times for Jim Seller's multimodule project created to sample the slow h2 behaviour) than postgreSQL. Will look and see if I can find ways to improve there.

@aikebah
Copy link
Collaborator

aikebah commented Oct 18, 2020

Managed to get performance up a notch by increasing Oracle's default fetch-size of 10 records closer to the other DBMS's default of 'fetch all'. Decided to keep Oracle a bit more sane and limit it to 10_000 entries for a single fetch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants