Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcap dump parsing issue - tokenizing data error #53

Closed
craig opened this issue Mar 22, 2022 · 4 comments
Closed

pcap dump parsing issue - tokenizing data error #53

craig opened this issue Mar 22, 2022 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@craig
Copy link
Contributor

craig commented Mar 22, 2022

This is the same dump from #51 - unfortunately, it has more issues:

$ file BT-20220314.pcap 
BT-20220314.pcap: pcap capture file, microsecond ts (little-endian) - version 2.4 (Ethernet, capture length 65536)
[INFO] 
    ____  _                     __            
   / __ \(_)____________  _____/ /_____  _____
  / / / / / ___/ ___/ _ \/ ___/ __/ __ \/ ___/
 / /_/ / (__  |__  )  __/ /__/ /_/ /_/ / /    
/_____/_/____/____/\___/\___/\__/\____/_/     

[INFO] Loading "BT-20220314.pcap"...
[INFO] Error reading PCAP file: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25

[INFO] Skipping the offending lines...
Traceback (most recent call last):
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 125, in read_pcap
    data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',')
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 581, in _read
    return parser.read(nrows)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1250, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 787, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <module>
    data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files])  # Read the FLOW file(s) into a dataframe
  File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <listcomp>
    data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files])  # Read the FLOW file(s) into a dataframe
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 184, in read_file
    return read_pcap(filename)
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 129, in read_pcap
    data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',',
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1231, in _make_engine
    return mapping[engine](f, **self.options)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 152, in __init__
    self._validate_parse_dates_presence(self.names)  # type: ignore[has-type]
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 228, in _validate_parse_dates_presence
    raise ValueError(
ValueError: Missing column provided to 'parse_dates': 'frame.time'
@tvdhout
Copy link
Collaborator

tvdhout commented Mar 23, 2022

Interesting... What tool do you use to capture the traffic / generate the PCAP? it seems it does not capture the timestamps

@tvdhout tvdhout added the bug Something isn't working label Mar 23, 2022
@craig
Copy link
Contributor Author

craig commented Mar 23, 2022

Logs are created with https://github.com/google/stenographer#querying / stenoread like this:

docker exec -it so-steno stenoread "after 2022-03-07T11:50:00Z and before 2022-03-07T12:00:00Z" -w /tmp/07032022-11_50-12_00.pcap

@tvdhout
Copy link
Collaborator

tvdhout commented Mar 23, 2022

Thanks, I'll check it out and see if I can find how to fix the dissector for this format.

In the meantime you can use tcpdump with a file limit of 1 and a file rotation of x seconds. To capture 10 minutes of traffic: sudo tcpdump -W 1 -G 600 -w /tmp/capture10mins.pcap

@tvdhout tvdhout self-assigned this Mar 23, 2022
@tvdhout
Copy link
Collaborator

tvdhout commented Apr 21, 2023

Not planned for now

@tvdhout tvdhout closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants