BUG: Occasional "tokenizing data error" when reading in large files with read_csv()

I sometimes receive a `Error tokenizing data. C error: ...` for tables that can normally be read with `read_csv()` without any problems.

Find attached the .csv file [sample.tar.gz](https://github.com/pandas-dev/pandas/files/6190356/sample.tar.gz) for which I can reproduce the problem.

```python
import pandas as pd
path = "sample.csv"
pd.read_csv(path, sep=";", header=[0,1])
```

This raises the following exception:

```lang-text
ParserError: Error tokenizing data. C error: Expected 15 fields in line 983050, saw 23
```

The tables I try to read have 23 columns, as declared correctly in the file header. However, the files contain corrupted lines (very few, <0.01% of all lines), where the data of 8 columns are omitted. For those lines, 8 delimiters are missing.

I'm working with about 100 different files containing 1M to 20M lines. All files suffer from the same kind of ill-formatted lines. `read_csv()` graciously handles those lines most of the time. Only for the file provided above, it raises an exception.

I can avoid the exception as follows:

- Delete a couple of unrelated (healthy) lines at the beginning of the document
- By setting `engine="python"` (slow)
- By setting `low_memory=False` 
- By setting `error_bad_lines=False` (drops a couple of lines)

In summary, I think `read_csv()` behaves inconsistently if running with `low-memory=True` and C-engine.

I first thought that the problem is related to issue #11166, but I'm not sure 100%.

[sample.tar.gz](https://github.com/pandas-dev/pandas/files/6190356/sample.tar.gz)

I'm running python3.8 and pandas 1.2.3. See details below.

#### Expected Output

No exception for file `sample.csv`, regardless of the settings for `engine` and `low_memory`.

#### System
<details>

```lang-text
INSTALLED VERSIONS
------------------
commit           : f2c8480af2f25efdbd803218b9d87980f416563e
python           : 3.8.0.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.7.0
Version          : Darwin Kernel Version 18.7.0: Fri Oct 30 12:37:06 PDT 2020; root:xnu-4903.278.44.0.2~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : None.UTF-8

pandas           : 1.2.3
numpy            : 1.20.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.3.3
setuptools       : 54.1.2
Cython           : None
pytest           : 6.2.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.13.0
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : None
fsspec           : 0.8.5
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.5
pandas_gbq       : None
pyarrow          : 1.0.0
pyxlsb           : None
s3fs             : None
scipy            : 1.3.2
sqlalchemy       : None
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None
```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Occasional "tokenizing data error" when reading in large files with read_csv() #40587

Expected Output

System

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: Occasional "tokenizing data error" when reading in large files with read_csv() #40587

Description

Expected Output

System

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions