-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapt CSV parsing to new data model #315
Comments
Relevant code:
|
@mfenner, here are some observations / assumptions I made today exploring the code... 1: All source models (except Mendeley) inherit their 2: Three individual source CSV reports – Mendeley, Counter, Pmc – need additional information that the other source CSV reports do not require. My assumption here is that once generic source CSV reports are working again (e.g. 3: Do you want the new columns (readers,html,counts) to be pushed up and included in the aggregate ALM report? If so, I'll plan on updating that the code that generates the ALM report to handle that. Let me know if I'm heading down the right track for the intended goals of this issue or if I'm missing anything. Thanks! |
1: Yes, fixing the CSV generation should be straightforward. Two other sources (pmc, counter) also have a 2: Yes. 3: Yes, the aggregate report used to have separate columns for html, pdf, etc. where it mattered. Forgot to include a link to a sample report: http://figshare.com/articles/Cumulative_PLOS_ALM_Report_February_2014/1189396 (BTW, Figshare is similar to Zenodo in functionality). The only other comment I would make is that I found the code used for report generation always a bit obscure, e.g. the giant SQL call in |
Yeah, I can see that. I think there's an intermingling of report-related concepts in the Report model. It took me a little bit to mentally separate out that the class-level reports generated for the CSV aren't related to the kinds of reports that the Report model represents and that the Report class has other unrelated service-like methods (all of the As I work on getting the CSV reports back in place I'm going to put a little thought on how to refactor and further the separation of those concepts. |
Great. |
@mfenner, I was looking at the reports.json from 691d4a6 to make sure I had an accurate understanding of what those views were doing. When generating a format specific CSV source report does the event's date and count sums correspond with the year/date and sums in |
Yes, the We are actually several different reports (see https://github.com/articlemetrics/lagotto/blob/master/lib/tasks/report.rake). The only one that we ever made public, and is the most important one is the summary report, generated by the Part of this is historic. Before there was Ruby code these reports were generated by a combination of Perl, R and some manual merging. The intermediate reports are useful, but strictly speaking no longer needed for the |
The core work on this is almost completed (see PR #351). I'm nearing completion on updating I haven't looked at the scheduling code yet, but I'm anticipating that will either not need to change at all or it will need to change very minimally. |
For each work the reports in CSV format collect total counts for every source. For three sources (mendeley, counter, pmc) we need additional information (readers, html, and pdf counts, respectively). Before Lagotto 4.0 this was done using a
to_csv
method in the respective model. With Lagotto 4.0, and the switch to store all sources data in MySQL instead of CouchDB, this functionality is broken.The text was updated successfully, but these errors were encountered: