-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crunch throws an error with failed d45 type #5
Comments
Good point about the documentation. We could use from datetime import date, datetime
import arrow # another popular option
str(date.today()) # returns '2021-09-10'
str(datetime.now()) # returns '2021-09-10 12:03:37.940927'
str(arrow.now()) # returns '2021-09-10T12:04:02.268544+02:00' Would that work for your use case? |
Am I correct in that arrow includes the timezone and datetime relies on UTC? I am an absolute beginner in python, so have no idea of what packages are out there and what they do, and how to modify the examples you listed in the docs to my liking ;-). I basically used R to generate the csv in the desired structure and then ran below in sequence: import D47crunch
mydata = D47crunch.D47data()
mydata.read('/home/japhir/SurfDrive/PhD/programming/dataprocessing/out/rawdata.csv')
mydata.wg() # as an aside, where do I tell it the d13C and d18O of the working gas?
mydata.plot_distribution_of_analyses() # gives an unusable plot if you have many measurements, in my case 1779 measurements only (that's a tiny subset of everything since may 2021)
mydata.crunch() # this gave me the error if I didn't make sure UID and Session were characters
mydata.standardize() # this still gives me the error in #6
mydata.summary(verbose = True, save_to_file = False) # can only reach this step with the simulated data the #6 in the comment of the code block doesn't create a link, so here it is. |
data = [{k: v if k in ['UID', 'Session', 'Sample'] else smart_type(v) for k,v in zip(txt[0], l) if v != ''} for l in txt[1:]]
if session != '':
for r in data:
r['Session'] = session |
>>> mydata.standardize()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/pyPjNI3W", line 3, in <module>
File "/tmp/babel-yqvP9s/python-GXTFNT", line 1, in <module>
mydata.standardize()
File "/home/japhir/SurfDrive/PhD/programming/apply_D47crunch/D47crunch.py", line 1006, in newfun
out = oldfun(*args, **kwargs)
File "/home/japhir/SurfDrive/PhD/programming/apply_D47crunch/D47crunch.py", line 1618, in standardize
params.add(f'a_{s}', value = 0.9)
File "/usr/lib/python3.9/site-packages/lmfit/parameter.py", line 373, in add
self.__setitem__(name, Parameter(value=value, name=name, vary=vary,
File "/usr/lib/python3.9/site-packages/lmfit/parameter.py", line 137, in __setitem__
raise KeyError("'%s' is not a valid Parameters name" % key)
KeyError: "'a_2020_01_03T00:00:00Z' is not a valid Parameters name" If I then re-implement my manual fix to convert the Session to character prior to export to csv and re-run, I get the other issue #6 where the Sample names cause issues >>> mydata.standardize()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/pykIhOHv", line 3, in <module>
File "/tmp/babel-yqvP9s/python-EUrohX", line 1, in <module>
mydata.standardize()
File "/home/japhir/SurfDrive/PhD/programming/apply_D47crunch/D47crunch.py", line 1006, in newfun
out = oldfun(*args, **kwargs)
File "/home/japhir/SurfDrive/PhD/programming/apply_D47crunch/D47crunch.py", line 1625, in standardize
params.add(f'D{self._4x}_{pf(sample)}', value = 0.5)
File "/usr/lib/python3.9/site-packages/lmfit/parameter.py", line 373, in add
self.__setitem__(name, Parameter(value=value, name=name, vary=vary,
File "/usr/lib/python3.9/site-packages/lmfit/parameter.py", line 137, in __setitem__
raise KeyError("'%s' is not a valid Parameters name" % key)
KeyError: "'D47_AU002_(2)' is not a valid Parameters name" |
for k,r in enumerate(mydata):
try:
r['d45'] / 1000
except:
print(k, r['d45'], type(r['d45']) This will print out the index, |
Agreed in principle, but such a plot could be pretty useful for anybody, there's nothing specific to Jens' study here. Again, if it works poorly for your use case, let me know and we can improve it if you feel it's worth it. |
Aargh now It's breaking again but this time I think it's because the Sample names in the subset that is causing issues are things like I've looked into how my R csv export decides to quote and/or escape things. By default it quotes "as needed", meaning that the above one with parentheses etc. gets quoted, but a simple After specifically filtering out any Sample with
wait, so python's rows are by default lists that can contain different types for each row? o.O |
I realized your problem is definitely caused by commas in sample names. The csv file uses commas as separators by default so that inserts spurious columns in your data. Using tabs or semicolons instead should help a lot.
Not always. Python has lists of items, where each item can be any object, even another list. There is no concept of rows for such lists. In this case There is no obligation that all items in for r in mydata:
if 'mineralogy' in r:
print(f'Analysis {r["UID"]}: {r["mineralogy"]}') |
ah yes, that must be it. Is there any escape syntax that you can use during export to keep the original identifier_1 column? The R export has quoted the fields, so the commas are technically correct csv syntax right? The link I shared before has some python csv package code that specifies how to handle quote syntax etc. |
Thanks for the elaboration :) |
I confess that I never took the time to use a real csv parser as you suggest. I'll add this as a todo item for the dev version. |
Hi Mathieu!
I've been trying out D47crunch for our data!
The first issue I had was that our UIDs and Sessions were integers and datetimes respectively, and that I had to cast them to character to get
mydata.crunch()
to work. Otherwise it would throw this error:I think I managed to fix this by first casting them to characters and then running it again.
The text was updated successfully, but these errors were encountered: