<a href="https://colab.research.google.com/github/maswadkar/python/blob/work_in_progress/pandas_002_Intro_to_data_structures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#[Intro to data structures](https://pandas.pydata.org/docs/user_guide/dsintro.html)

In [None]:
import numpy as np
import pandas as pd
pd.__version__

<p>We’ll start with a quick, non-comprehensive overview of the fundamental data
structures in pandas to get you started. The fundamental behavior about data
types, indexing, and axis labeling / alignment apply across all of the
objects. To get started, import NumPy and load pandas into your namespace:</p>

<p>Here is a basic tenet to keep in mind: <strong>data alignment is intrinsic</strong>. The link
between labels and data will not be broken unless done so explicitly by you.</p>
<p>We’ll give a brief intro to the data structures, then consider all of the broad
categories of functionality and methods in separate sections.</p>

#Series

<p><a class="reference internal" href="../reference/api/pandas.Series.html#pandas.Series" title="pandas.Series"><code class="xref py py-class docutils literal notranslate"><span class="pre">Series</span></code></a> is a one-dimensional labeled array capable of holding any data
type (integers, strings, floating point numbers, Python objects, etc.). The axis
labels are collectively referred to as the <strong>index</strong>. The basic method to create a Series is to call:</p>

In [None]:
s = pd.Series(data, index=index)

<p>Here, <code class="docutils literal notranslate"><span class="pre">data</span></code> can be many different things:</p>
<ul class="simple">
<li><p>a Python dict</p></li>
<li><p>an ndarray</p></li>
<li><p>a scalar value (like 5)</p></li>
</ul>
<p>The passed <strong>index</strong> is a list of axis labels. Thus, this separates into a few
cases depending on what <strong>data is</strong>:</p>
<p>If <code class="docutils literal notranslate"><span class="pre">data</span></code> is an ndarray, <strong>index</strong> must be the same length as <strong>data</strong>. If no
index is passed, one will be created having values <code class="docutils literal notranslate"><span class="pre">[0,</span> <span class="pre">...,</span> <span class="pre">len(data)</span> <span class="pre">-</span> <span class="pre">1]</span></code>.</p>

In [None]:
s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
s

In [None]:
s.index

In [None]:
pd.Series(np.random.randn(5))

<div class="admonition note">
<p class="admonition-title">Note</p>
<p>pandas supports non-unique index values. If an operation
that does not support duplicate index values is attempted, an exception
will be raised at that time. The reason for being lazy is nearly all performance-based
(there are many instances in computations, like parts of GroupBy, where the index
is not used).</p>
</div>

<p><strong>From dict</strong></p>
<p>Series can be instantiated from dicts:</p>

In [None]:
d = {"b": 1, "a": 0, "c": 2}
pd.Series(d)

<div class="admonition note">
<p class="admonition-title">Note</p>
<p>When the data is a dict, and an index is not passed, the <code class="docutils literal notranslate"><span class="pre">Series</span></code> index
will be ordered by the dict’s insertion order, if you’re using Python
version &gt;= 3.6 and pandas version &gt;= 0.23.</p>
<p>If you’re using Python &lt; 3.6 or pandas &lt; 0.23, and an index is not passed,
the <code class="docutils literal notranslate"><span class="pre">Series</span></code> index will be the lexically ordered list of dict keys.</p>
</div>
<p>In the example above, if you were on a Python version lower than 3.6 or a
pandas version lower than 0.23, the <code class="docutils literal notranslate"><span class="pre">Series</span></code> would be ordered by the lexical
order of the dict keys (i.e. <code class="docutils literal notranslate"><span class="pre">['a',</span> <span class="pre">'b',</span> <span class="pre">'c']</span></code> rather than <code class="docutils literal notranslate"><span class="pre">['b',</span> <span class="pre">'a',</span> <span class="pre">'c']</span></code>).</p>
<p>If an index is passed, the values in data corresponding to the labels in the
index will be pulled out.</p>

In [None]:
d = {"a": 0.0, "b": 1.0, "c": 2.0}
pd.Series(d)

In [None]:
pd.Series(d, index=["b", "c", "d", "a"])

<div class="admonition note">
<p class="admonition-title">Note</p>
<p>NaN (not a number) is the standard missing data marker used in pandas.</p>
</div>

<p><strong>From scalar value</strong></p>
<p>If <code class="docutils literal notranslate"><span class="pre">data</span></code> is a scalar value, an index must be
provided. The value will be repeated to match the length of <strong>index</strong>.</p>

In [None]:
pd.Series(5.0, index=["a", "b", "c", "d", "e"])

##Series is dict-like

<p>A Series is like a fixed-size dict in that you can get and set values by index
label:</p>

In [None]:
s["a"]

In [None]:
s["e"] = 12.0

In [None]:
s

In [None]:
"e" in s

In [None]:
"f" in s

In [None]:
#If a label is not contained, an exception is raised:
s['f']

<p>Using the <code class="docutils literal notranslate"><span class="pre">get</span></code> method, a missing label will return None or specified default:</p>

In [None]:
s.get("f")

In [None]:
s.get("f", np.nan)