## Snowflake Architecture

<p>Snowflake’s architecture is a <b>hybrid of traditional shared-disk and shared-nothing database architectures</b>. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the platform. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.</p>

<img src='https://docs.snowflake.com/en/_images/architecture-overview.png'></img>

<b>Snowflake’s unique architecture consists of three key layers:</b>
<ul>
    <li>Database Storage</li>
    <li>Query Processing</li>
    <li>Cloud Services</li>
</ul>

<h3>Database Storage</h3>
<p>
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.
</p>
<p>
Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. <b>The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.</b>
</p>

<h3>Query Processing</h3>
<p>Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.</p>
<p>Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.</p>
<p>Note:Melwin Plese Note virtual warehouse have a different meaning in Snowflake its nothing but a Compute Engine
<h3>Cloud Services</h3>
<p>The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.</p>
<p>Services managed in this layer include:</p>
<ul>
    <li>Authentication</li>
    <li>Infrastructure management</li>
    <li>Metadata management</li>
    <li>Query parsing and optimization</li>
    <li>Access control</li>
</ul>    

<h3>Connecting to Snowflake</h3>
<p>Snowflake supports multiple ways of connecting to the service:</p>
<ul>
    <li>A web-based user interface from which all aspects of managing and using Snowflake can be accessed.</li>
    <li>Command line clients (e.g. SnowSQL) which can also access all aspects of managing and using Snowflake.</li>
    <li>ODBC and JDBC drivers that can be used by other applications (e.g. Tableau) to connect to Snowflake.</li>
    <li>Native connectors (e.g. Python, Spark) that can be used to develop applications for connecting to Snowflake.</li>
    <li>Third-party connectors that can be used to connect applications such as ETL tools (e.g. Informatica) and BI tools (e.g. ThoughtSpot) to Snowflake.</li>
</ul>

<h3>Micro-partitions</h3>
<ul>
<li>So data in Snowflake is automatically organized into partitions known as micro partitions.</li>
<li>Micro partitions in Snowflake are managed automatically and don't require intervention by the user.</li>
<li>As the name suggests, micro partitions are relatively small and each micro partition will generally
contain <b>50 MB to 500 MB</b> of uncompressed data.However, do note that the actual stored data is smaller as data in Snowflake is always stored with compression.</li>
<li>Micro partitions are added to a table in the order of how the data arrived in the table.<br>
So if additional data is added to a table, another micro partition or possibly multiple micro partitions<br>
depending on the size of the data, will be created to accommodate this data.Snowflake.</li>
<li>Micro partitions are immutable, which means they cannot be changed once created.<br>
Any update to existing data or loading of new data into a table will result in new micro partitions
being created.<br>
Because micro partitions are immutable and any update or new data must be added into a new micro partition.<br>
Therefore, it is not necessary that similar partition values will always be in the same physical partition.
</li>
<li>Snowflake must keep track of what range of data is in which partitions so that it can use that information
for efficient query processing.</li>
<li>Now Snowflake maintains several different kinds of metadata for a given table for this purpose.<br>
It stores the range of column values in its metadata.<br>
That is the maximum and minimum value for each column in each micro partition.<br>
With this metadata information, Snowflake can intelligently decide which partitions to read when processing a query.<br>
Similarly, it also stores the count of distinct values for each column in the metadata and certain<br>
other information to assist in query optimization.</li>
<li>Now, another important aspect is that within each micro partition, the data is stored in a columnar format.<br>
So each column is stored compressed, and snowflake automatically determines the most appropriate and
best compression algorithm.<br>
    Storing data in columnar format enables Snowflake to optimize queries even further when a subset of
columns are accessed.<br>
So consider this straightforward SQL example on the screen where we are querying the table on the left</li>
</ul>