Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing bug with tabs in log line for MDC #10416

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

stevenwinship
Copy link
Contributor

@stevenwinship stevenwinship commented Mar 22, 2024

Log lines for MDC must be tab delimited but in some cases the data in the lines contain tabs which throws off the columns.
Replace the tabs with spaces in MakeDataCountLoggingServiceBean.

In this example the authors column(14) contains 2 tabs adding 2 additional columns causing an error in counter_processor
ex:
2024-03-21 15:17:49 importing log/counter_2024-02-01.log
line is wrong: 2024-02-01T00:13:30-0500 98.121.236.139 - - :guest https://dataverse.harvard.edu/api/v1/datasets/export?exporter=schema.org&persistentId=doi%3A10.7910%2FDVN%2FJ1UD6S doi:10.7910/DVN/J1UD6S - - python-requests/2.28.2 Main Cities grid tbd 00_metadata 00_metadata 100% 10 Digital Map Database of China 已启用屏幕阅读器支持。 Digital Map Database of China 2020-02-15T21:35:14Z 1 - https://dataverse.harvard.edu/api/v1/datasets/export?exporter=schema.org&persistentId=doi%3A10.7910%2FDVN%2FJ1UD6S 2020

Which issue(s) this PR closes:

Needed for IQSS/dataverse.harvard.edu#3

Special notes for your reviewer:

Suggestions on how to test this:
Make sure the log file has no extra tabs

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: No

Additional documentation: None

@stevenwinship stevenwinship self-assigned this Mar 22, 2024
@stevenwinship stevenwinship added NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... Size: 3 A percentage of a sprint. 2.1 hours. pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations labels Mar 22, 2024
@stevenwinship stevenwinship added this to the 6.2 milestone Mar 22, 2024
@coveralls
Copy link

Coverage Status

coverage: 20.66% (+0.001%) from 20.659%
when pulling 2129555 on 255-operationalize-mdc-fix-log-formatting
into 30666f9 on develop.

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:255-operationalize-mdc-fix-log-formatting
ghcr.io/gdcc/configbaker:255-operationalize-mdc-fix-log-formatting

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - gets rid of tabs that cause issues and I don't think any of this info is used outside counter-processor so any transform that doesn't result in false matches should be OK (and we wouldn't expect anything differing only by whitespace to really be different as far as c-p cares.).

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Was going to approve, and ask @qqmyers to take a look anyway, just in case... But he appears to have beat me to it.

@sekmiller sekmiller self-assigned this Mar 26, 2024
@sekmiller sekmiller merged commit 472bb3d into develop Mar 27, 2024
19 checks passed
@stevenwinship stevenwinship linked an issue Mar 27, 2024 that may be closed by this pull request
@sekmiller sekmiller deleted the 255-operationalize-mdc-fix-log-formatting branch March 27, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

Operationalize MDC: Fix log formatting
5 participants