Skip to content
This repository has been archived by the owner on Apr 22, 2024. It is now read-only.

InfluxDB-python version 5.3.0 chunk=True #820

Open
xiandong79 opened this issue Apr 13, 2020 · 23 comments
Open

InfluxDB-python version 5.3.0 chunk=True #820

xiandong79 opened this issue Apr 13, 2020 · 23 comments
Assignees
Labels
Milestone

Comments

@xiandong79
Copy link

xiandong79 commented Apr 13, 2020

  • InfluxDB-python version: 5.3.0
  • Python version: 3.7.4
  • Operating system version: macOS 10.14.5

msgpack.exceptions.ExtraData: unpack(b) received extra data

Traceback (most recent call last):
  File "/Users/dong/Desktop/mosaic-research/analysis/analysis.py", line 17, in <module>
    public_book = mosaic_client.public_book(exchange=exchange, instrument=instrument, ts_start=ts, ts_end=ts+save_interval, depth=1)
  File "/Users/dong/Desktop/mosaic-research/py_mosaic_client/py_mosaic_client/mosaic_client.py", line 74, in public_book
    result = self.client.query(f'SELECT * FROM "l2_book-{exchange}" WHERE time > {ts_start} AND time <= {ts_end}', chunked=True, chunk_size=10000)
  File "/Users/dong/opt/anaconda3/lib/python3.7/site-packages/influxdb/client.py", line 518, in query
    expected_response_code=expected_response_code
  File "/Users/dong/opt/anaconda3/lib/python3.7/site-packages/influxdb/client.py", line 352, in request
    raw=False)
  File "msgpack/_unpacker.pyx", line 209, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
@xiandong79
Copy link
Author

c903d73

I think it may be related to this PR

@sebito91
Copy link
Contributor

Thanks for reporting this @xiandong79, I'll investigate ASAP. I should have added a test to the dataframe_client for this.

@sebito91 sebito91 added this to the v5.3.1 milestone Apr 14, 2020
@sebito91 sebito91 added the bug label Apr 14, 2020
@hrbonz
Copy link
Contributor

hrbonz commented Apr 14, 2020

I can take a look too if that helps, I haven't come across that issue though.

@xiandong79
Copy link
Author

the version 5.2.3. works well

@hiksuman
Copy link

I'm having the same issue querying from both Influx 1.7.10 and 1.7.7
Interestingly with Influx 1.0.2 the bug is not present.

@sebito91
Copy link
Contributor

There are a lot of differences between 5.2.3 and 5.3.0, which is why we stepped a minor release instead of point-release.

@hrbonz if you want to take a look that would be AWESOME!

@laurikoobas
Copy link

laurikoobas commented Apr 16, 2020

I am getting a different error, but seemingly from a similar place.
InfluxDB-python version: 5.3.0
Python version: 3.7.4
Operating system version: Ubuntu 16.04


influxdb/client.py in request(self, url, method, params, data, stream, expected_response_code, headers)
    350                 packed=response.content,
    351                 ext_hook=_msgpack_parse_hook,
--> 352                 raw=False)
    353         else:
    354             response._msgpack = None

msgpack/_unpacker.pyx in msgpack._cmsgpack.unpackb()
`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 8: invalid continuation byte`

@chaconpiza
Copy link

chaconpiza commented May 6, 2020

Similar with SHOW DIAGNOSTICS query

  • InfluxDB 1.7.6
  • InfluxDB-python version: 5.3.0
  • Python version: 3.6.9
  • Operating system version: Ubuntu 18.04
python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from influxdb import client
>>> influxdb_client = client.InfluxDBClient("192.168.10.6", "8086")
>>> influxdb_client.ping()
'1.7.6'
>>> influxdb_client.query('SHOW DIAGNOSTICS')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vagrant/.local/lib/python3.6/site-packages/influxdb/client.py", line 518, in query
    expected_response_code=expected_response_code
  File "/home/vagrant/.local/lib/python3.6/site-packages/influxdb/client.py", line 352, in request
    raw=False)
  File "msgpack/_unpacker.pyx", line 213, in msgpack._cmsgpack.unpackb
ValueError: Unpack failed: incomplete input
  • With InfluxDB-python version: 5.2.3 the bug is not present.

openstack-gerrit pushed a commit to openstack/requirements that referenced this issue May 7, 2020
influxdb release 5.3.0 includes an open bug that breaks many of the
calls to influxdb. Like for example 'SHOW DIAGNOSTICS' [1]

[1] influxdata/influxdb-python#820

Change-Id: Ie802f865ceacbecd5887817fd0f25137d28f5350
openstack-gerrit pushed a commit to openstack/openstack that referenced this issue May 7, 2020
* Update requirements from branch 'master'
  - Merge "Block influxdb==5.3.0"
  - Block influxdb==5.3.0
    
    influxdb release 5.3.0 includes an open bug that breaks many of the
    calls to influxdb. Like for example 'SHOW DIAGNOSTICS' [1]
    
    [1] influxdata/influxdb-python#820
    
    Change-Id: Ie802f865ceacbecd5887817fd0f25137d28f5350
@yozik04
Copy link

yozik04 commented Jun 10, 2020

I can confirm query with chunk=True does not work on 5.3.0.

@xiandong79
Copy link
Author

I can confirm query with chunk=True does not work on 5.3.0.

@nikparmar
Copy link

Hello Team,
Any workaround for this issue?

@yozik04
Copy link

yozik04 commented Jun 21, 2020

Hello Team,
Any workaround for this issue?

Sure use <5.3.0

@marko-asplund
Copy link

having the same issue - any progress?

@AnkitSinghvi99
Copy link

Hi,

Having same issue. Any solution ?

Debian GNU/Linux 9.4 (stretch)
python 2.7.13
Influx 1.8.3
Influxdb 5.3.1
msgpack 1.0.2

msgpack.exceptions.ExtraData: unpack(b) received extra data.

Traceback (most recent call last):
File "/code/apps/FuelChangeoverPlot.py", line 179, in exportFromDb
data = data_fetcher.fetch_fuel_change_over_plot(start_time=rangeStart, end_time=rangeEnd)
File "/code/db_interface/data_fetcher.py", line 35, in fetch_fuel_change_over_plot
df_dict = db_connector.query_for_single_measurement_range(
File "/code/db_interface/db_connector.py", line 80, in query_for_single_measurement_range
df_dict = client.query(
File "/usr/local/lib/python3.9/site-packages/influxdb/_dataframe_client.py", line 199, in query
results = super(DataFrameClient, self).query(query, **query_args)
File "/usr/local/lib/python3.9/site-packages/influxdb/client.py", line 521, in query
response = self.request(
File "/usr/local/lib/python3.9/site-packages/influxdb/client.py", line 358, in request
response._msgpack = msgpack.unpackb(
File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb

@srijan
Copy link

srijan commented Jan 28, 2021

There are actually two issues here:

  1. Unpack issue when using msgpack. I did not debug this further, but here's a workaround that works for me: use json instead of msgpack. This can be forced using:
    client = InfluxDBClient(host, port, u, p, db, headers={'Accept': 'application/json'}, gzip=True)

  2. Even when using the above, DataFrameClient does not work. This is because DataFrameClient was not updated along with this commit c903d73.

@hrbonz
Copy link
Contributor

hrbonz commented Apr 26, 2021

Debian GNU/Linux bullseye/sid
Python 3.9.2
influxdb-python master branch
Influxdb 1.8.5 and 1.7.3
msgpack 1.0.2

Run my test scripts with export MSGPACK_PUREPYTHON=1 to use python implementation of msgpackrather than the C, easier for debugging.

Analysis

I've looked into this issue today, it looks to me like a combination of two problems:

  • I forgot to add the stream/chunk changes to DataFrameClient because I didn't even realize it existed, I'll submit a PR with the proper changes for it.

  • If I run a 'SHOW DIAGNOSTIC' with headers set to accept JSON, I get the following:

{
  "results": [
    {
      "statement_id": 0,
      "series": [
        {
          "name": "build",
          "columns": [
            "Branch",
            "Build Time",
            "Commit",
            "Version"
          ],
          "values": [
            [
              "1.7",
              "",
              "ff383cdc0420217e3460dabe17db54f8557d95b6",
              "1.7.8"
            ]
          ]
        },
        {
          "name": "config",
          "columns": [
            "bind-address",
            "reporting-disabled"
          ],
          "values": [
            [
              "127.0.0.1:8098",
              true
            ]
          ]
        },
        {
          "name": "config-coordinator",
          "columns": [
            "log-queries-after",
            "max-concurrent-queries",
            "max-select-buckets",
            "max-select-point",
            "max-select-series",
            "query-timeout",
            "write-timeout"
          ],
          "values": [
            [
              "0s",
              0,
              0,
              0,
              0,
              "0s",
              "10s"
            ]
          ]
        },
        {
          "name": "config-cqs",
          "columns": [
            "enabled",
            "query-stats-enabled",
            "run-interval"
          ],
          "values": [
            [
              true,
              false,
              "1s"
            ]
          ]
        },
        {
          "name": "config-data",
          "columns": [
            "cache-max-memory-size",
            "cache-snapshot-memory-size",
            "cache-snapshot-write-cold-duration",
            "compact-full-write-cold-duration",
            "dir",
            "max-concurrent-compactions",
            "max-index-log-file-size",
            "max-series-per-database",
            "max-values-per-tag",
            "series-id-set-cache-size",
            "wal-dir",
            "wal-fsync-delay"
          ],
          "values": [
            [
              1073741824,
              26214400,
              "10m0s",
              "4h0m0s",
              "/var/lib/influxdb/data",
              0,
              1048576,
              1000000,
              100000,
              100,
              "/var/lib/influxdb/wal",
              "0s"
            ]
          ]
        },
        {
          "name": "config-httpd",
          "columns": [
            "access-log-path",
            "bind-address",
            "enabled",
            "https-enabled",
            "max-connection-limit",
            "max-row-limit"
          ],
          "values": [
            [
              "",
              ":8096",
              true,
              false,
              0,
              0
            ]
          ]
        },
        {
          "name": "config-meta",
          "columns": [
            "dir"
          ],
          "values": [
            [
              "/var/lib/influxdb/meta"
            ]
          ]
        },
        {
          "name": "config-monitor",
          "columns": [
            "store-database",
            "store-enabled",
            "store-interval"
          ],
          "values": [
            [
              "_internal",
              true,
              "10s"
            ]
          ]
        },
        {
          "name": "config-precreator",
          "columns": [
            "advance-period",
            "check-interval",
            "enabled"
          ],
          "values": [
            [
              "30m0s",
              "10m0s",
              true
            ]
          ]
        },
        {
          "name": "config-retention",
          "columns": [
            "check-interval",
            "enabled"
          ],
          "values": [
            [
              "30m0s",
              true
            ]
          ]
        },
        {
          "name": "config-subscriber",
          "columns": [
            "enabled",
            "http-timeout",
            "write-buffer-size",
            "write-concurrency"
          ],
          "values": [
            [
              true,
              "30s",
              1000,
              40
            ]
          ]
        },
        {
          "name": "network",
          "columns": [
            "hostname"
          ],
          "values": [
            [
              "db01"
            ]
          ]
        },
        {
          "name": "runtime",
          "columns": [
            "GOARCH",
            "GOMAXPROCS",
            "GOOS",
            "version"
          ],
          "values": [
            [
              "amd64",
              2,
              "linux",
              "go1.11"
            ]
          ]
        },
        {
          "name": "system",
          "columns": [
            "PID",
            "currentTime",
            "started",
            "uptime"
          ],
          "values": [
            [
              10884,
              "2021-04-26T09:59:01.187859258Z",
              "2021-04-26T08:10:39.214602676Z",
              "1h48m21.973256582s"
            ]
          ]
        }
      ]
    }
  ]
}

When running without any headers, we get msgpack back with the following:

b'\x81\xa7results\x91\x82\xacstatement_id\x00\xa6series\x9e\x83\xa4name\xa5build\xa7columns\x94\xa6Branch\xaaBuild Time\xa6Commit\xa7Version\xa6values\x91\x94\xa31.7\xa0\xd9(ff383cdc0420217e3460dabe17db54f8557d95b6\xa51.7.8\x83\xa4name\xa6config\xa7columns\x92\xacbind-address\xb2reporting-disabled\xa6values\x91\x92\xae127.0.0.1:8098\xc3\x83\xa4name\xb2config-coordinator\xa7columns\x97\xb1log-queries-after\xb6max-concurrent-queries\xb2max-select-buckets\xb0max-select-point\xb1max-select-series\xadquery-timeout\xadwrite-timeout\xa6values\x91\x97\x00\x00\x00\x00\x83\xa4name\xaaconfig-cqs\xa7columns\x93\xa7enabled\xb3query-stats-enabled\xacrun-interval\xa6values\x91\x93\xc3\xc2\x83\xa4name\xabconfig-data\xa7columns\x9c\xb5cache-max-memory-size\xbacache-snapshot-memory-size\xd9"cache-snapshot-write-cold-duration\xd9 compact-full-write-cold-duration\xa3dir\xbamax-concurrent-compactions\xb7max-index-log-file-size\xb7max-series-per-database\xb2max-values-per-tag\xb8series-id-set-cache-size\xa7wal-dir\xafwal-fsync-delay\xa6values\x91\x9c\xb6/var/lib/influxdb/data\x00\xd2\x00\x0fB@\xd2\x00\x01\x86\xa0d\xb5/var/lib/influxdb/wal\x83\xa4name\xacconfig-httpd\xa7columns\x96\xafaccess-log-path\xacbind-address\xa7enabled\xadhttps-enabled\xb4max-connection-limit\xadmax-row-limit\xa6values\x91\x96\xa0\xa5:8096\xc3\xc2\x00\x00\x83\xa4name\xabconfig-meta\xa7columns\x91\xa3dir\xa6values\x91\x91\xb6/var/lib/influxdb/meta\x83\xa4name\xaeconfig-monitor\xa7columns\x93\xaestore-database\xadstore-enabled\xaestore-interval\xa6values\x91\x93\xa9_internal\xc3\x83\xa4name\xb1config-precreator\xa7columns\x93\xaeadvance-period\xaecheck-interval\xa7enabled\xa6values\x91\x93\xc3\x83\xa4name\xb0config-retention\xa7columns\x92\xaecheck-interval\xa7enabled\xa6values\x91\x92\xc3\x83\xa4name\xb1config-subscriber\xa7columns\x94\xa7enabled\xachttp-timeout\xb1write-buffer-size\xb1write-concurrency\xa6values\x91\x94\xc3\xd1\x03\xe8(\x83\xa4name\xa7network\xa7columns\x91\xa8hostname\xa6values\x91\x91\xa4db01\x83\xa4name\xa7runtime\xa7columns\x94\xa6GOARCH\xaaGOMAXPROCS\xa4GOOS\xa7version\xa6values\x91\x94\xa5amd64\x02\xa5linux\xa6go1.11\x83\xa4name\xa6system\xa7columns\x94\xa3PID\xabcurrentTime\xa7started\xa6uptime\xa6values\x91\x94\xd1*\x84\xc7\x0c\x05\x00\x00\x00\x00`\x86\x8e\xe5\x12J\xde\xab\xc7\x0c\x05\x00\x00\x00\x00`\x86u\x7f\x0c\xca\x93\xb4\xb21h48m22.092293879s'

Both should be representing the same data but the config-coordinator structure doesn't include all the values:
x83\xa4name\xb2config-coordinator\xa7columns\x97\xb1log-queries-after\xb6max-concurrent-queries\xb2max-select-buckets\xb0max-select-point\xb1max-select-series\xadquery-timeout\xadwrite-timeout\xa6values\x91\x97\x00\x00\x00\x00\x83\xa4name\xaaconfig-cqs
We can see here by the end of the string, we have \x97 that defines a 7 entries 'fixarray' but we're getting only three zeroes (\x00) before seeing an \x83 that should start the next data structure ('config-cqs').
For this reason, I believe the bug actually exists server side. That might be a similar issue generated when doing a regular query, I couldn't figure it out. I'm also not extra comfortable with go so couldn't really find where this is implemented in the server.
This behavior appeared soon after my commit because 7fb5e94 changed the default headers to request msgpack instead of the default JSON.

Summary

  1. I should push a PR to implement the fixed chunked behavior in DataFrameClient.
  2. I suspect there is a bug with the msgpack implementation server side but can't help with this. I think someone with better go knowledge should dig on that one.

@hrbonz
Copy link
Contributor

hrbonz commented Apr 26, 2021

@sebito91

@hrbonz
Copy link
Contributor

hrbonz commented Apr 27, 2021

Tried to do the request directly on the line with curl and still got a messed up msgpack answer with the same issue:

$ curl -G 'http://localhost:8096/query' --data-urlencode q='SHOW DIAGNOSTICS'  --header "Accept: application/x-msgpack" --header "Content-Type: application/json" -u root --output response.txt

@AnkitSinghvi99
Copy link

AnkitSinghvi99 commented May 5, 2021

@hrbonz @sebito91
May be i am asking a silly question here.
Above fix is part of current released library or future release.
If future when it is expected to release?

As i tested today i still get below issue.
msgpack.exceptions.ExtraData: unpack(b) received extra data.

@MichielBbal
Copy link

Same issue here:
msgpack/_unpacker.pyx in msgpack._cmsgpack.unpackb()

ExtraData: unpack(b) received extra data.

@KirannBhavaraju
Copy link

KirannBhavaraju commented May 27, 2021

For any future readers,

  • The error persists in 5.3.1 and in 5.3.0 as well.
  • This query works and doesnt throw msgpack.exceptions.extradata: unpack(b) received extra data without adding additional headers like {'Accept': 'Application/json'} and while still using msgpack i believe. Please note that I am not using the DataFrameClient.
    This query in my case fetches around 6.67mil points and takes 403.592 seconds.
client = InfluxDBClient(host=host, port=port, username=user, password=password, database=dbname)
start_time = time.monotonic()
res = pd.DataFrame(client.query("select * from X where time > now() - 30m", chunked=True).get_points())
end_time = time.monotonic()
with outlock:
     print("Result from {} took {}".format(host,end_time-start_time))
     print(res)

Versions used

python --version = Python 3.7.8
influxdb.__version__ = 5.2.3

@ErlendFax
Copy link

ErlendFax commented Sep 7, 2021

Still get ExtraData: unpack(b) received extra data., but after trying @KirannBhavaraju suggestion it worked!

Only thing I did was to remove thechunk_size=xxxx argument.

client = InfluxDBClient(blah blah)
result = client.query(q, chunked=True)

python = "^3.8"
influxdb = "5.3.1"

@Kylmakalle
Copy link

Only thing I did was to remove thechunk_size=xxxx argument.

Responses will be chunked by series or by every 10,000 points, whichever occurs first.
https://docs.influxdata.com/influxdb/v1.7/guides/querying_data/#chunking

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests