Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critical DB issue #33

Closed
meren opened this issue Feb 8, 2015 · 8 comments
Closed

Critical DB issue #33

meren opened this issue Feb 8, 2015 · 8 comments
Labels

Comments

@meren
Copy link
Member

meren commented Feb 8, 2015

Well, I am getting this error with large files that do not happen with smaller ones.

This may be a db.commit() issue. Needs to be checked properly:

meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)

Finding ORFs in contigs
===============================================
Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt

HMM Profiling for Dupont_et_al
===============================================
Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945

HMM Profiling for Campbell_et_al
===============================================
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364

HMM Profiling for Wu_et_al
===============================================
Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
  File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in <module>
    main(args)
  File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
    g.populate_search_tables(annotation_db, sources)
  File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
    search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
  File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
    self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
  File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
    self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
  File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
    ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists
@meren meren added the bug label Feb 8, 2015
@annaship
Copy link

annaship commented Feb 8, 2015

The query is "create table..." A

-----Original Message-----
From: "A. Murat Eren" notifications@github.com
Sent: ‎2/‎8/‎2015 12:17 AM
To: "meren/PaPi" PaPi@noreply.github.com
Subject: [PaPi] Critical DB issue (#33)

Well, I am getting this error with large files that do not happen with smaller ones.
This may be a db.commit() issue. Needs to be checked properly:
meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)

Finding ORFs in contigs

Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt

HMM Profiling for Dupont_et_al

Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945

HMM Profiling for Campbell_et_al

Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364

HMM Profiling for Wu_et_al

Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in
main(args)
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
g.populate_search_tables(annotation_db, sources)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists

Reply to this email directly or view it on GitHub.

@annaship
Copy link

annaship commented Feb 8, 2015

Oops, sorry. So, I was going to say, the query is "create table" and the error "table ... Exists". You can add "create table if not exists...".

-----Original Message-----
From: "A. Murat Eren" notifications@github.com
Sent: ‎2/‎8/‎2015 12:17 AM
To: "meren/PaPi" PaPi@noreply.github.com
Subject: [PaPi] Critical DB issue (#33)

Well, I am getting this error with large files that do not happen with smaller ones.
This may be a db.commit() issue. Needs to be checked properly:
meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)

Finding ORFs in contigs

Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt

HMM Profiling for Dupont_et_al

Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945

HMM Profiling for Campbell_et_al

Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364

HMM Profiling for Wu_et_al

Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in
main(args)
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
g.populate_search_tables(annotation_db, sources)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists

Reply to this email directly or view it on GitHub.

@meren
Copy link
Member Author

meren commented Feb 8, 2015

Hi Anna,

Thank you for the comment. But the issue is a bit knottier than that.

The table is there, and the code should never branch into that line. Here where the check is being done:

https://github.com/meren/PaPi/blob/master/PaPi/annotation.py#L260

You can see in the line before, it gets the table names from the database, which is a simple operation:

https://github.com/meren/PaPi/blob/master/PaPi/db.py#L153

This error does not happen with runs that don't take too long. I think database connection times out, and it doesn't return table names properly or something. Because the table is there, it is just the code sometimes, especially during long runs can't recover that information from the database :/

@meren
Copy link
Member Author

meren commented Feb 8, 2015

Anna,

Would you say keeping every db operation atomic is a better practice instead of keeping the connection open for very long periods of time?

I am planning to change the database wrapper class to a much more atomic one, so every function will (1) open a connection, (2) query the db and collect results, (3) close connection. I am not sure how much overhead this will bring, but I feel like it can resolve this issue I am dealing with (if this approach is better, I will rely on aspect oriented programming to decorate db functions in https://github.com/meren/PaPi/blob/master/PaPi/db.py).

Best,

@annaship
Copy link

annaship commented Feb 8, 2015

Meren,

Yes, you check if the table exists with python, maybe to do that within the
db is good too.
That's your error message:

File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists

"OperationalError: table search_info already exists" will be gone if query
is

self._exec('''CREATE TABLE IF NOT EXISTS %s (%s)''' % (table_name,
db_fields))

I'm sure you are right and the real problem is different, but maybe you
will have more clear error message at least.

The second thing is a connection timeout.
You can do as you said and open and close a connection each time. Only
benchmark will tell you if it is much slower. It will be, probably,
especially if without it you can use "prepare". And it will not help if one
query is running too long.

There are several options you can play with.

  1. Increase timeout (
    https://docs.python.org/2/library/sqlite3.html#sqlite3.connect)

"
sqlite3.connect(database[, timeout, detect_types, isolation_level,
check_same_thread, factory, cached_statements]) Opens a connection to the
SQLite database file database. You can use ":memory:" to open a database
connection to a database that resides in RAM instead of on disk. When a
database is accessed by multiple connections, and one of the processes
modifies the database, the SQLite database is locked until that transaction
is committed. The timeout parameter specifies how long the connection
should wait for the lock to go away until raising an exception. The default
for the timeout parameter is 5.0 (five seconds).
"

  1. See how many connections Sqlite can open at the same time. Some info
    here, for example:
    http://stackoverflow.com/questions/3610775/how-many-connections-can-sqlite-3-handle

  2. Use multithreading,
    https://docs.python.org/2/library/sqlite3.html#multithreading

Not sure if it help, I'd like to know more about the db configuration.

Anya

On Sun, Feb 8, 2015 at 2:13 PM, A. Murat Eren notifications@github.com
wrote:

Anna,

Would you say keeping every db operation atomic is a better practice
instead of keeping the connection open for very long periods of time?

I am planning to change the database wrapper class to a much more atomic
one, every function will (1) open a connection, (2) query the db and
collect results, (3) close connection. I am not sure how much overhead this
will bring, but I feel like it can resolve this issue I am dealing with (if
this approach is better, I will rely on aspect oriented programming to
decorate db functions in
https://github.com/meren/PaPi/blob/master/PaPi/db.py).

Best,


Reply to this email directly or view it on GitHub
#33 (comment).

@meren
Copy link
Member Author

meren commented Feb 8, 2015

If you knew about the db configuration you would have decided to come to the lab with a bucket of gasoline and set my computer on fire.

I am changing the classes a bit so the database connection is not made until the data is ready for submission. I will see if this changes anything, but it definitely makes more sense as you pointed out.

Best,

@annaship
Copy link

annaship commented Feb 8, 2015

:-)

On Sun, Feb 8, 2015 at 3:50 PM, A. Murat Eren notifications@github.com
wrote:

If you knew about the db configuration you would have decided to come to
the lab with a bucket of gasoline and set my computer on fire.

I am changing the classes a bit so the database connection is not made
until the data is ready for submission. I will see if this changes
anything, but it definitely makes more sense as you pointed out.

Best,


Reply to this email directly or view it on GitHub
#33 (comment).

meren added a commit that referenced this issue Feb 8, 2015
…cture of annotation library. this now passes my small tests, and will see how it porforms on very large datasets. Related #33
@meren
Copy link
Member Author

meren commented Feb 9, 2015

I just finished testing on the servers, commit 34d7342 worked like magic, so this is resolved! :)

@meren meren closed this as completed Feb 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants