# Reading and Writing Data in Text Format

<table>
  <thead>
    <tr>
      <th>No.</th>
      <th>Concept/Function</th>
      <th>Description</th>
      <th>Code Example</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><code>read_csv</code></td>
      <td>Loads delimited data from a file, URL, or file-like object. Uses a comma as the default delimiter.</td>
      <td><code>df = pd.read_csv('data.csv')</code></td>
    </tr>
    <tr>
      <td>2</td>
      <td><code>read_table</code></td>
      <td>Loads delimited data from a file, URL, or file-like object. Uses a tab ('\t') as the default delimiter.</td>
      <td><code>df = pd.read_table('data.txt', sep='\t')</code></td>
    </tr>
    <tr>
      <td>3</td>
      <td><code>read_fwf</code></td>
      <td>Reads data in fixed-width column format (no delimiters).</td>
      <td><code>df = pd.read_fwf('data.fwf')</code></td>
    </tr>
    <tr>
      <td>4</td>
      <td><code>read_clipboard</code></td>
      <td>Reads data from the clipboard, useful for converting tables from web pages.</td>
      <td><code>df = pd.read_clipboard()</code></td>
    </tr>
    <tr>
      <td>5</td>
      <td><code>index_col</code></td>
      <td>Specifies column numbers or names to use as the row index in the result DataFrame.</td>
      <td><code>df = pd.read_csv('data.csv', index_col='column_name')</code></td>
    </tr>
    <tr>
      <td>6</td>
      <td><code>names</code></td>
      <td>Specifies column names for the result DataFrame.</td>
      <td><code>df = pd.read_csv('data.csv', names=['col1', 'col2'])</code></td>
    </tr>
    <tr>
      <td>7</td>
      <td><code>na_values</code></td>
      <td>Specifies a list or set of strings to consider as missing values during parsing.</td>
      <td><code>df = pd.read_csv('data.csv', na_values=['NA', 'NULL'])</code></td>
    </tr>
    <tr>
      <td>8</td>
      <td><code>sep</code></td>
      <td>Specifies the delimiter or regular expression used to split fields in each row.</td>
      <td><code>df = pd.read_csv('data.csv', sep=';')</code></td>
    </tr>
    <tr>
      <td>9</td>
      <td><code>skiprows</code></td>
      <td>Specifies the number of rows at the beginning of the file to ignore or a list of row numbers to skip.</td>
      <td><code>df = pd.read_csv('data.csv', skiprows=[0, 2, 3])</code></td>
    </tr>
  </tbody>
</table>


<table>
  <thead>
    <tr>
      <th>No.</th>
      <th>Description</th>
      <th>Code Example</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>Loading data from a CSV file using <code>read_csv</code>:</td>
      <td><code>df = pd.read_csv('ch06/ex1.csv')</code></td>
    </tr>
    <tr>
      <td>2</td>
      <td>Loading data from a text file with a custom delimiter using <code>read_table</code>:</td>
      <td><code>df = pd.read_table('ch06/ex1.csv', sep=',')</code></td>
    </tr>
    <tr>
      <td>3</td>
      <td>Reading data from a CSV file with no header row and specifying column names using <code>names</code>:</td>
      <td><code>df = pd.read_csv('ch06/ex2.csv', names=['a', 'b', 'c', 'd', 'message'])</code></td>
    </tr>
    <tr>
      <td>4</td>
      <td>Loading data with hierarchical index from a CSV file using <code>index_col</code>:</td>
      <td><code>parsed = pd.read_csv('ch06/csv_mindex.csv', index_col=['key1', 'key2'])</code></td>
    </tr>
    <tr>
      <td>5</td>
      <td>Reading data from a text file with variable whitespace using <code>read_table</code> and a regular expression delimiter:</td>
      <td><code>result = pd.read_table('ch06/ex3.txt', sep='\\s+')</code></td>
    </tr>
    <tr>
      <td>6</td>
      <td>Skipping specific rows while reading data from a CSV file using <code>skiprows</code>:</td>
      <td><code>df = pd.read_csv('ch06/ex4.csv', skiprows=[0, 2, 3])</code></td>
    </tr>
    <tr>
      <td>7</td>
      <td>Handling missing values during data reading using <code>na_values</code>:</td>
      <td><code>result = pd.read_csv('ch06/ex5.csv', na_values=['NULL'])</code></td>
    </tr>
  </tbody>
</table>


# Table 6-2. read_csv /read_table function arguments

<table>
  <thead>
    <tr>
      <th>Argument</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>path</td>
      <td>String indicating filesystem location, URL, or file-like object</td>
    </tr>
    <tr>
      <td>sep or delimiter</td>
      <td>Character sequence or regular expression to use to split fields in each row</td>
    </tr>
    <tr>
      <td>header</td>
      <td>Row number to use as column names. Defaults to 0 (first row), but should be None if there is no header row</td>
    </tr>
    <tr>
      <td>index_col</td>
      <td>Column numbers or names to use as the row index in the result. Can be a single name/number or a list of them for a hierarchical index</td>
    </tr>
    <tr>
      <td>names</td>
      <td>List of column names for result, combine with header=None</td>
    </tr>
    <tr>
      <td>skiprows</td>
      <td>Number of rows at beginning of file to ignore or list of row numbers (starting from 0) to skip</td>
    </tr>
    <tr>
      <td>na_values</td>
      <td>Sequence of values to replace with NA</td>
    </tr>
    <tr>
      <td>comment</td>
      <td>Character or characters to split comments off the end of lines</td>
    </tr>
    <tr>
      <td>parse_dates</td>
      <td>Attempt to parse data to datetime; False by default. If True, will attempt to parse all columns. Otherwise can specify a list of column numbers or name to parse. If element of list is tuple or list, will combine multiple columns together and parse to date (for example if date/time split across two columns)</td>
    </tr>
    <tr>
      <td>keep_date_col</td>
      <td>If joining columns to parse date, drop the joined columns. Default True</td>
    </tr>
    <tr>
      <td>converters</td>
      <td>Dict containing column number of name mapping to functions. For example {'foo': f} would apply the function f to all values in the 'foo' column</td>
    </tr>
    <tr>
      <td>dayfirst</td>
      <td>When parsing potentially ambiguous dates, treat as international format (e.g. 7/6/2012 -> June 7, 2012). Default False</td>
    </tr>
    <tr>
      <td>date_parser</td>
      <td>Function to use to parse dates</td>
    </tr>
    <tr>
      <td>nrows</td>
      <td>Number of rows to read from beginning of file</td>
    </tr>
    <tr>
      <td>iterator</td>
      <td>Return a TextParser object for reading file piecemeal</td>
    </tr>
    <tr>
      <td>chunksize</td>
      <td>For iteration, size of file chunks</td>
    </tr>
    <tr>
      <td>skip_footer</td>
      <td>Number of lines to ignore at end of file</td>
    </tr>
    <tr>
      <td>verbose</td>
      <td>Print various parser output information, like the number of missing values placed in non-numeric columns</td>
    </tr>
    <tr>
      <td>encoding</td>
      <td>Text encoding for unicode. For example 'utf-8' for UTF-8 encoded text</td>
    </tr>
    <tr>
      <td>squeeze</td>
      <td>If the parsed data only contains one column return a Series</td>
    </tr>
    <tr>
      <td>thousands</td>
      <td>Separator for thousands, e.g. ',' or '.'</td>
    </tr>
  </tbody>
</table>


# Reading Text Files in Pieces

<table>
    <tr>
        <th>Method</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><code>pd.read_csv('ch06/ex6.csv')</code></td>
        <td>Reads the entire CSV file into a DataFrame. Displays a summary of the DataFrame's structure.</td>
    </tr>
    <tr>
        <td><code>pd.read_csv('ch06/ex6.csv', nrows=5)</code></td>
        <td>Reads a specific number of rows (5 in this case) from the CSV file into a DataFrame.</td>
    </tr>
    <tr>
        <td><code>chunker = pd.read_csv('ch06/ex6.csv', chunksize=1000)</code></td>
        <td>Creates a TextParser object that allows you to iterate over the CSV file in chunks of 1000 rows each.</td>
    </tr>
    <tr>
        <td><code>for piece in chunker:</code></td>
        <td>Iterates through each chunk of data from the TextParser object. You can perform operations on each chunk.</td>
    </tr>
    <tr>
        <td><code>chunk = chunker.get_chunk(500)</code></td>
        <td>Reads and returns a specific chunk of data with 500 rows from the TextParser object.</td>
    </tr>
</table>


# Writing Data Out to Text Format

# Manually Working with Delimited Formats

<table>
  <thead>
    <tr>
      <th>Argument</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>delimiter</td>
      <td>One-character string to separate fields. Defaults to ','.</td>
    </tr>
    <tr>
      <td>lineterminator</td>
      <td>Line terminator for writing, defaults to '\r\n'. Reader ignores this and recognizes cross-platform line terminators.</td>
    </tr>
    <tr>
      <td>quotechar</td>
      <td>Quote character for fields with special characters (like a delimiter). Default is '"'.</td>
    </tr>
    <tr>
      <td>quoting</td>
      <td>Quoting convention. Options include csv.QUOTE_ALL (quote all fields), csv.QUOTE_MINIMAL (only fields with special characters like the delimiter), csv.QUOTE_NONNUMERIC, and csv.QUOTE_NON (no quoting). See Python’s documentation for full details. Defaults to QUOTE_MINIMAL.</td>
    </tr>
    <tr>
      <td>skipinitialspace</td>
      <td>Ignore whitespace after each delimiter. Default False.</td>
    </tr>
    <tr>
      <td>doublequote</td>
      <td>How to handle quoting character inside a field. If True, it is doubled. See online documentation for full detail and behavior.</td>
    </tr>
    <tr>
      <td>escapechar</td>
      <td>String to escape the delimiter if quoting is set to csv.QUOTE_NONE. Disabled by default.</td>
    </tr>
  </tbody>
</table>


# JSON Data

<!DOCTYPE html>
<html>
<head>
<style>
  table {
    border-collapse: collapse;
    width: 100%;
  }

  th, td {
    border: 1px solid black;
    padding: 8px;
    text-align: left;
  }
</style>
</head>
<body>

<h2>JSON Methods and Examples</h2>

<table>
  <tr>
    <th>Method/Function</th>
    <th>Description</th>
    <th>Parameters and Possible Values</th>
    <th>Example</th>
    <th>Output (if applicable)</th>
  </tr>
  <tr>
    <td><code>json.dumps(obj, indent=None, separators=None, sort_keys=False)</code></td>
    <td>Serialize a Python object into a JSON formatted string.</td>
    <td>- <code>obj</code>: The Python object to be serialized.<br>- <code>indent</code>: Number of spaces for indentation. Default: <code>None</code>.<br>- <code>separators</code>: Tuple of separators. Default: <code>(", ", ": ")</code>.<br>- <code>sort_keys</code>: Sort dictionary keys. Default: <code>False</code>.</td>
    <td><pre><code>data = {"name": "John", "age": 30, "city": "New York"}
json_string = json.dumps(data, indent=2, sort_keys=True)
print(json_string)</code></pre></td>
    <td><pre>{<br>  "age": 30,<br>  "city": "New York",<br>  "name": "John"<br>}</pre></td>
  </tr>
  <tr>
    <td><code>json.loads(json_string)</code></td>
    <td>Deserialize a JSON string into a Python object.</td>
    <td>- <code>json_string</code>: The JSON formatted string to be deserialized.</td>
    <td><pre><code>json_string = '{"name": "John", "age": 30, "city": "New York"}'
data = json.loads(json_string)
print(data["name"])</code></pre></td>
    <td>John</td>
  </tr>
  <tr>
    <td><code>json.dump(obj, fp, indent=None, separators=None, sort_keys=False)</code></td>
    <td>Serialize a Python object and write it to a file-like object.</td>
    <td>- <code>obj</code>: The Python object to be serialized.<br>- <code>fp</code>: File-like object to write JSON data to.<br>- <code>indent</code>: Number of spaces for indentation. Default: <code>None</code>.<br>- <code>separators</code>: Tuple of separators. Default: <code>(", ", ": ")</code>.<br>- <code>sort_keys</code>: Sort dictionary keys. Default: <code>False</code>.</td>
    <td><pre><code>data = {"name": "John", "age": 30, "city": "New York"}
with open("data.json", "w") as fp:
    json.dump(data, fp, indent=2)</code></pre></td>
    <td>Creates "data.json" with the JSON content shown.</td>
  </tr>
  <tr>
    <td><code>json.load(fp)</code></td>
    <td>Deserialize JSON data from a file-like object.</td>
    <td>- <code>fp</code>: File-like object to read JSON data from.</td>
    <td><pre><code>import json
with open("data.json", "r") as fp:
    data = json.load(fp)
    print(data["name"])</code></pre></td>
    <td>John</td>
  </tr>
</table>

</body>
</html>


# XML and HTML: Web Scraping
# Binary Data Formats
# Interacting with HTML and Web APIs
# Interacting with Databases