Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query re iterative data #1039

Closed
KyleHaynes opened this issue Oct 22, 2020 · 4 comments
Closed

query re iterative data #1039

KyleHaynes opened this issue Oct 22, 2020 · 4 comments

Comments

@KyleHaynes
Copy link

Thanks for such useful software.

Sorry if this question is extremely daft but from my brief usage of the software and scanning through associated documentation, I can't easily seem to see a simple solution.

Say if i receive data x from a data custodian on a monthly basis (with just new records appended), each time I want to import and store (mainly for the benefits of validation) in Data Curator, however, I can't seem to easily apply the original created metadata json schema.

I would have thought the process would be to 1. import csv data (easy enough to do) and then 2 file > import column properties > json from file ...
image
And select the schema from the original zip file (unzipped obviously)... but doing so just indicates it's not a valid schema.

Otherwise, I would have thought it might be possible from importing package properties, but the only option is from a URL, and the data that I'm interested in adding will never be uploaded to any internal/external URL.

Here is some dummy data if interested in poking about ...
ss_c1_export.zip

@ghost
Copy link

ghost commented Oct 26, 2020

Hi @KyleHaynes
I'll have a look at the data when I can and see what the problem might be with the schema or the application
But just to make sure I've understood the issue:

  1. I would have thought it might be possible from importing package properties, but the only option is from a URL,. Yes unfortunately there are 'Import' options that we used up our current scope on and didn't have time to add others (Table, and Package by file. With more funding to the project, I'm hoping to be able to add these features in to complete this.
  2. doing so just indicates it's not a valid schema. So if you're importing Column properties, I'll have a look at the schema and see where the issue might be. The lower level errors that come back from the libraries we use can be difficult to interpret depending on what the error is, but in a future release maybe we can look at adding this as an option to display these errors in event that someone does want to try to dig further.

If there isn't a problem with the application itself, there might be some other ways to try to use the data (But again please let me know if I haven't quite understood the use case here):

  1. If you already have the original data and the original schema (ie: including the Column properties):
  • Open a new tab with the newest data,
  • Copy and paste this data, (e.g., 'Select All' and 'Copy') from the new, second table over the existing one in the first tab.
  • If the table had a header row 'locked' ('Tools'-> 'Header Row') in either of the tables, and they are different, you will need to unlock the table header to include it in any 'Copy' or 'Paste'
    That way it should keep the existing column properties and you can then apply to the new data.
  1. If you have the unzipped new data available in a folder and you have the original table and schema already in Data Curator, you could:
  • clear the existing data from the table (say 'Select All' and 'Delete') - again ensure that the header row is unlocked to include it in the 'Delete'
  • drag and drop the file from your unzipped folder into the blank tab
    This should also keep the original schema as it is just the data that you have switched.
  1. It's not clear to me yet (once I have a look at the example you've supplied I might know more) whether the schema is a valid frictionless schema and is not being imported just because there is an issue with the data not matching the schema. If that is the reason, then another way to allow the import might be:
  • Start the Data Curator without any existing data or schemas
  • Check the number of columns that are in the schema that you have and add these so the Data Curator has the correct number of rows
  • Now import the Schema
  • Open the csv file (or drag/drop file in)

Let me know if any of these cases help or if there is more detail for me to consider here.
Although it doesn't cover the case you have, the 'Help' menu, does also offer some basics about use of Data Curator that might be useful.

The use case(s) you've raised here though are a new one for me. If the schema that you have is a valid frictionless schema (ie: regardless of what the data may be), it would be useful for Data Curator to still import it, no matter what state the data is in. Once I've tested your example Kyle I'll add more to this

@ghost
Copy link

ghost commented Oct 26, 2020

Hi @KyleHaynes
I've had at go at importing for column properties and it seems like it succeeded. Not sure if you've seen the frictionless documentation (from the help message you mentioned in Data Curator), but basically I removed the outer json for package and table, so it just showed everything under, but not including the keyname: schema.
Data Curator will flag that it needs a certain number of columns if it doesn't match, but that's usually an indicator that it recognises the schema and it's just a matter of adding the number of blank columns required.
I've added your example schema to our Data Curator's test fixtures, here - a copy of how the json looks as just a schema (as opposed to the original datapackage.json you supplied).
Hope it helps.

@KyleHaynes
Copy link
Author

@mattRedBox - thanks a lot for the detailed reply - this has worked perfectly.
Cheers
Kyle

@ghost
Copy link

ghost commented Oct 29, 2020

No problem @KyleHaynes. Glad it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant