Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Data Freeze/Thaw Issues #13

Closed
zachcp opened this issue Aug 13, 2020 · 3 comments
Closed

Some Data Freeze/Thaw Issues #13

zachcp opened this issue Aug 13, 2020 · 3 comments

Comments

@zachcp
Copy link
Contributor

zachcp commented Aug 13, 2020

Hi @huahaiy ,

I love the idea of Datalevin and have been toying with the idea of generating DBs for scientific data that I could then distribute (e.g. sqlite for datalog). I have tried to use Datalevin on a download from the NPAtlas and in doing so have encountered a few errors/bugs that I am hoping you can help me resolve.

Errors Observed

I've posted a hopefully reproducible example as a gist. In this example, after I process and load the file in to a DB I run into a few errors during the query.

  1. Fail to get-value: "Thaw failed against type-id: 78" (when using a schema)
  2. Fail to get-value: "Thaw failed against type-id: 16" (using empty schema)
  3. Empty Set when expecting results (using empty schema)

Not all queries are broken but when I query using for:smiles I run into issues. These are strings with a bunch of special characters and I wonder if there is some string-escaping happening somewhere during the freeze. I was able to nippy-freeze/thaw on them so that could be wrong but it is my best guess.

System Environment

I'm on OSX with Clojure 10.1, Java 8, and DataLevin 0.2.16

(System/getProperty "java.vm.version")
=> "25.192-b01"
@huahaiy
Copy link
Contributor

huahaiy commented Aug 13, 2020

Thanks for the report. I can reproduce the error, and will look into the problem.

BTW, when you load the data, you should load them in a single transaction for a much faster load.

(def data (read-np-atlas))

(def txs (map-indexed
          (λ [i e]
            (-> e
                (remove-if-empty :external_ids)
                (flatten-external-ids)
                (dissoc :origin_organism :origin_reference :reassignments :syntheses :node_id :cluster_id)
                (assoc :db/id (- (inc i)))))
          data))

(def conn (d/create-conn np-schema "/tmp/npatlas"))

(time (d/transact! conn txs))
;;=>"Elapsed time: 9227.012203 msecs"

huahaiy added a commit that referenced this issue Aug 14, 2020
@huahaiy
Copy link
Contributor

huahaiy commented Aug 14, 2020

Release 0.2.17 fixes this issue. Thanks.

@huahaiy huahaiy closed this as completed Aug 14, 2020
@zachcp
Copy link
Contributor Author

zachcp commented Aug 14, 2020

Fantastic. Thank You.

@den1k den1k mentioned this issue Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants