pcap dump parsing issue - tokenizing data error #53

craig · 2022-03-22T15:54:23Z

This is the same dump from #51 - unfortunately, it has more issues:

$ file BT-20220314.pcap 
BT-20220314.pcap: pcap capture file, microsecond ts (little-endian) - version 2.4 (Ethernet, capture length 65536)

[INFO] 
    ____  _                     __            
   / __ \(_)____________  _____/ /_____  _____
  / / / / / ___/ ___/ _ \/ ___/ __/ __ \/ ___/
 / /_/ / (__  |__  )  __/ /__/ /_/ /_/ / /    
/_____/_/____/____/\___/\___/\__/\____/_/     

[INFO] Loading "BT-20220314.pcap"...
[INFO] Error reading PCAP file: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25

[INFO] Skipping the offending lines...
Traceback (most recent call last):
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 125, in read_pcap
    data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',')
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 581, in _read
    return parser.read(nrows)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1250, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 787, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <module>
    data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files])  # Read the FLOW file(s) into a dataframe
  File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <listcomp>
    data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files])  # Read the FLOW file(s) into a dataframe
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 184, in read_file
    return read_pcap(filename)
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 129, in read_pcap
    data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',',
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1231, in _make_engine
    return mapping[engine](f, **self.options)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 152, in __init__
    self._validate_parse_dates_presence(self.names)  # type: ignore[has-type]
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 228, in _validate_parse_dates_presence
    raise ValueError(
ValueError: Missing column provided to 'parse_dates': 'frame.time'

The text was updated successfully, but these errors were encountered:

tvdhout · 2022-03-23T13:10:40Z

Interesting... What tool do you use to capture the traffic / generate the PCAP? it seems it does not capture the timestamps

craig · 2022-03-23T14:31:14Z

Logs are created with https://github.com/google/stenographer#querying / stenoread like this:

docker exec -it so-steno stenoread "after 2022-03-07T11:50:00Z and before 2022-03-07T12:00:00Z" -w /tmp/07032022-11_50-12_00.pcap

tvdhout · 2022-03-23T15:54:58Z

Thanks, I'll check it out and see if I can find how to fix the dissector for this format.

In the meantime you can use tcpdump with a file limit of 1 and a file rotation of x seconds. To capture 10 minutes of traffic: sudo tcpdump -W 1 -G 600 -w /tmp/capture10mins.pcap

tvdhout · 2023-04-21T13:16:58Z

Not planned for now

tvdhout added the bug Something isn't working label Mar 23, 2022

tvdhout self-assigned this Mar 23, 2022

tvdhout closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pcap dump parsing issue - tokenizing data error #53

pcap dump parsing issue - tokenizing data error #53

craig commented Mar 22, 2022 •

edited

Loading

tvdhout commented Mar 23, 2022

craig commented Mar 23, 2022

tvdhout commented Mar 23, 2022

tvdhout commented Apr 21, 2023

pcap dump parsing issue - tokenizing data error #53

pcap dump parsing issue - tokenizing data error #53

Comments

craig commented Mar 22, 2022 • edited Loading

tvdhout commented Mar 23, 2022

craig commented Mar 23, 2022

tvdhout commented Mar 23, 2022

tvdhout commented Apr 21, 2023

craig commented Mar 22, 2022 •

edited

Loading