Skip to content
This repository has been archived by the owner on Apr 22, 2024. It is now read-only.

Insert pandas dataframe into InfluxDB issues #576

Closed
dz123 opened this issue Apr 15, 2018 · 3 comments
Closed

Insert pandas dataframe into InfluxDB issues #576

dz123 opened this issue Apr 15, 2018 · 3 comments

Comments

@dz123
Copy link

dz123 commented Apr 15, 2018

@tzonghao @aviau @xginn8 @sebito91

I am trying to store some trading data into InfluxDB using the DataFrameClient with the write_points method. I have read the documentation online as well as the following two issues, #286 and #510.

Here is what my data looks like:

Date Ticker Close Volume
2018-04-15 00:00:00+00:00 MSFT 1.3 2.50
2018-04-14 00:00:00+00:00 MSFT 3.5 4.24
2018-04-15 00:00:00+00:00 AAPL 7.0 11.00
2018-04-14 00:00:00+00:00 AAPL 6.0 1.00

Below is my code:

client = DataFrameClient(host, port, user, password, dbname)
headers = ["Date","Ticker","Close", "Volume"]
data = [["2018-04-15","MSFT",1.3,2.5], ["2018-04-14","MSFT",3.5,4.24], ["2018-04-15","AAPL",7,11], ["2018-04-14","AAPL",6,1]]
df = pd.DataFrame(data, columns = headers)
df.Date = pd.to_datetime(df["Date"])
df = df.set_index("Date")
tags = { "Ticker": df[["Ticker"]]}
client.write_points(df, 'test', tags = tags, protocol = "json")

However this gives this below error message when I call write_points

InfluxDBClientError: 400: {"error":"partial write: unable to parse 'test,Ticker=\\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ Ticker': missing fields\nunable to parse 'Date\\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ ':

A similar message shows up when I try to separate out the "Ticker" tags column out of the data frame I write, so:

timeValues = df[["Close","Volume"]]
client.write_points(timeValues, 'test', tags = tags, protocol = "json")

Leads to the same error message above. I have three questions I would really love to get help on!:

  1. How do I fix what I am doing above? Is it the protocol thats wrong? In the documentation, the comment suggests to use "json" as a workaround for some reported bugs
  2. I also have the same time stamp for two different tag values (ie the same dates for both MSFT and APPL). Is this an issue when I write into the database?
  3. For the time series I am trying to write, there will be certain nan values for some of the tickers. For example, what if volume field value for 4-14 is nan for APPL? Will this still work? There were a few bug reports that seemed to suggest I cant write nan into database. EDIT: I found this DataFrame write "nan", "inf" error in influxdb #422 posted and it seems like the work around is to have separate measurements by field and then drop the na rows before writing to database.
@tzonghao
Copy link
Contributor

@dz123

  1. I've not tried to run your code, but it seems you'd want to use tag_columns=['Ticker'] instead of tags=tags.
  2. Should be OK, since InfluxDB series consist of "measurement" + "tags", i.e. different tags == different series.
  3. NaN and inf are not supported in InfluxDB, see here.

@dz123
Copy link
Author

dz123 commented Apr 16, 2018

@tzonghao Thanks! That helped!

@kalikim
Copy link

kalikim commented Nov 27, 2020

@tzonghao Thanks! That helped!

@dz123 what did you do with this error, what did you change?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants