-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job_table() returning empty dataframe #517
Comments
Did you use the pyiron base version from the Git repository or from conda? @liamhuber Can this be related to the recent database changes? |
In addition if the new jobs appear but the old ones do not can you post the database entries to compare how they differ? You can use the https://sqlitebrowser.org to open the SQLite files in a graphical interface. |
I touched how the database access strings are parsed, so it's definitely possible this introduced a bug. There is almost certainly not a problem when using sqlite as I've been running the new code on my local machine for quite some time without trouble. I think I've done stuff on the cluster, in which case Postgres is probably ok? I haven't done anything with MySQL though, so couldn't say one way or another there |
Aha @Leimeroth i just remembered a big change: as before we use XOR logic to control the configuration, but we changed the order of priority! It used to be user input XOR confit file XOR environment variables. Now it's user-env-config. Sonic you have any pyiron env vars (except PYIRONCONFIG) then you'll get these+defaults, ognoringg your config file entirely. If you have your database defined in a config file, please check if you have any PYIRON vars in 'os.environ ' that would screw you up |
Ugh, phone spelling, sorry |
Git versions
For new jobs (those I see in pr.job_table() now) the projectpath value is NULL and the complete path is stored in the project column. For old jobs projectpath is /home/pyiron/projects/, and project is the missing part for the working directory.
I am using sqlite
I can't find any pyiron related environment variable, and I do not know where it should come from. It is possible that I overlooked something, but checking with
also didn't find anything beside my PYTHONPATHS to the git pyiron versions. |
I did some tests and I can get the full job_dict when running: In the _job_dict() method the "project" column is set to project_path, which I find confusing tbh.
But maybe this also confused someone else and is set to a different value somewhere? |
Ok, yes, if these are the case then you should be looking at the right database so no worries there. And your snippet checking for ENV contamination looks good to me.
I still can't rule out that my changes caused this, but it sounds less likely. I haven't touched
Do you know where your head was before this? I think this bug might be nasty enough to warrant running a git bisection test. Once we know the exact commit that breaks things we'll stand a chance, but right now the behaviour is extremely mysterious to me. |
Might be something I messed up in #336, but it's weird that it only comes up today. |
Git bisecting shows that #486 introduced the bug. I still think this could be some mess up with project and project path, since all new entries have an empty project_path, and the full path up to working directory is stored under project instead. |
Perfect, thanks! Yeah, your description is reasonable. I'm totally stumped right now because when I look at the diff I don't see anything in that direction getting touched. But now that we know the PR the problem must be soluble! I'll keep looking at it today. |
@pmrv you mentioned reproducing the error when you use the staging database...do you invoke one of the |
I can use this Both older and newer jobs have I also note that |
I did it by updating the |
Ok, then I really don't know what should be happening. Calling However while this properly updates |
This confuses me because ALL my old jobs have a project_path, and their project is never an absolute path (i.e. it never starts with / ), but whether the project entry ends with hdf_5 or not depends on whether the jobs are child jobs or not, but not at all on if they are new or old |
Aha, great point! Yes, I think my 'cluster' job is a child, so that at least makes sense.
This is still a mystery to me too |
The default value of |
Yup, this then gets used in |
I added PROJECT_CHECK_ENABLED=True to my .pyiron file and now it works again, so I guess this causes the problem |
Man, that is really bizarre. I did the same (well, actually I reverted the default as seen over in #519), but my job table still shows None for the projectpath. At least this relieves some time pressure in solving this now that it's working for you again... |
We really need more simple tests. I saw @liamhuber added some tests in pyiron_base/tests/database/test_manager.py Lines 32 to 38 in 4f50b8d
However, they check for if the PROJECT_PATH is correctly added to these paths or not. I would like to have a test checking for the correct split of the whole path into root_path and project_path (in terms of the pr.attribute names, not the ones in the db). Actually we should probably always write a test if we found a bug to make our CI fail as well 😃
|
💯 Does anyone want to do this? I won't have time to look at it this week. Since I revert the default behaviour over in #519, and since it's now working for @Leimeroth I guess we can close? I am also amenable to keeping it open until someone adds a test for the project attributes. |
Tests in #520 😄 |
This is issue is also present now on the cluster, but setting |
Fix is sitting in #519 |
Great! Testing the robot now with a merge of #519 and #518 on ~600 jobs, so we get an idea of how long it will take. |
It's still running, but the short news is that it will take forever to convert all files in the current way. I'm currently waiting for more than half an hour on the rewrite of a single ~100MB file. I did take the opportunity to do an
i.e. a good 20% of runtime is spent opening and closing files! Notice also disproportionate amount of reading vs. writing, which suggests we do a lot of auxiliary reading before actually fetching and writing data. |
Well the misbalance between read and write is kind of expected with the awful 'read array and then array(to_list)' before we actually have got it... However, for the number of calls this is also a big difference... |
The Anyway I've updated the robot to
Once I've converted all of those 600 jobs (which tqdm estimates to ~1h), this should be good to go from my side. Still it'd be nice if someone can test the script on some of their data, before we unleash it onto the cluster. |
Update: It took around 1h 30min for 1.3GB of data combined. I didn't see any errors, so I think it's good to go, but will try to move to more data now and see if anything happens. |
Thanks for testing this rigorously! I just do not have data to test 😆 |
And if we patch the robot to be faster in an upcoming release, this is fine from my perspective. |
Currently testing on my full data set and patching small errors that come up, will update before I leave the office. |
ok, please just update #518 with these patches and merge. Then I will make a new release also lifting the h5io restrictions on conda this evening 🎉 |
I pushed it accidentally to #525, but since it includes #518 I suppose this will still work for you. Update from my extended testing shows two failure modes that I addressed in the script now:
Still there was a large number of unaffected jobs in my ~100GB of data, so we can actually be a bit optimistic to do the conversion over night. |
Yes, I will cherry-pick them over to the other branch and get everything merged and released! |
After updating conda packages and pyiron versions pr.job_table() returns an empty dataframe. I have absolutely no clue what causes this problem.
The database seems to be completely fine. I tested it using sqlite3 PRAGMA integrity_check, by connecting to my database backup and trying pr.job_table() (still empty df) and had a look at the database using sqlitebrowser.
I also tried to downgrade sqlalchemy from 1.4.27 to 1.4.26 and checked if the connection in the Settings is correct.
If I create new jobs they get a correct id (starting with a number over 300000, not with 0) and the new jobs are shown correctly in the dataframe, but all jobs from before updating are not shown.
@jan-janssen @pmrv mentioning you to get you notified, if you have any idea what causes this / how to fix this please let me know, this really messes with my work
The text was updated successfully, but these errors were encountered: