Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
junyuan-chen committed Nov 17, 2022
1 parent 5520ea4 commit d47bc11
Show file tree
Hide file tree
Showing 6 changed files with 192 additions and 51 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Expand Up @@ -14,7 +14,7 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
CategoricalArrays = "0.10"
DataAPI = "1.12"
DataAPI = "1.13"
DataFrames = "1"
DataValues = "0.4"
PrettyTables = "1, 2"
Expand Down
1 change: 1 addition & 0 deletions docs/make.jl
Expand Up @@ -18,6 +18,7 @@ makedocs(
"Manual" => [
"Getting Started" => "man/getting-started.md",
"Table Interface" => "man/table-interface.md",
"Metadata" => "man/metadata.md",
"Value Labels" => "man/value-labels.md",
"Date and Time Values" => "man/date-and-time-values.md"
],
Expand Down
33 changes: 17 additions & 16 deletions docs/src/man/getting-started.md
@@ -1,12 +1,13 @@
# Getting Started

Here is an introduction to the main function of ReadStatTables.jl.
An overview of the usage of
[ReadStatTables.jl](https://github.com/junyuan-chen/ReadStatTables.jl) is provided below.
For instructions on installation, see [Installation](@ref).

## Reading a Data File

Suppose we have a Stata `.dta` file located at `data/sample.dta`.
To read this file into Julia, run
To read this file into Julia:

```@repl getting-started
using ReadStatTables
Expand Down Expand Up @@ -46,16 +47,18 @@ tb.mylabl
The returned array is exactly the same array holding the data for the table.
Therefore, modifying elements in the returned array
will also change the data in the table.
To avoid such changes, please make a copy of the array first (by calling [`copy`](https://docs.julialang.org/en/v1/base/base/#Base.copy)).
To avoid such changes, please
[`copy`](https://docs.julialang.org/en/v1/base/base/#Base.copy) the array first.

Some metadata for the data file are also contained in `tb`:
Metadata for the data file can be accessed from `tb`
using methods that are compatible with [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl).

```@repl getting-started
getmeta(tb)
metadata(tb)
colmetadata(tb)
colmetadata(tb, :myord)
```

See [Table Interface](@ref) for more complete reference.

## Type Conversions

The types provided by ReadStatTables.jl should be sufficient for basic tasks.
Expand All @@ -70,7 +73,7 @@ as long as the table type can be constructed with an input following
the [Tables.jl](https://github.com/JuliaData/Tables.jl) interface.

For example, to convert the table into a `DataFrame` from
[DataFrames.jl](https://github.com/JuliaData/DataFrames.jl), we run
[DataFrames.jl](https://github.com/JuliaData/DataFrames.jl):

```@repl getting-started
using DataFrames
Expand All @@ -85,37 +88,35 @@ we need to determine whether we should keep the values or the labels.
If only the labels contain the relevant information,
we can make use of the `labels` function which returns an iterator for the labels.
For example, to convert a `LabeledArray` to a `CategoricalArray` from
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl),
we run
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl):

```@repl getting-started
using CategoricalArrays
CategoricalArray(labels(tb.mylabl))
```

Sometimes, the values have special meanings while the labels are not so important.
To obtain an array of the values without the labels,
we can call `refarray`:
To access the array of values underlying a `LabeledArray` directly:

```@repl getting-started
refarray(tb.mylabl)
```

Alternatively, for a specific element type in the output array,
we can call `convert`:
Alternatively, convert a `LabeledArray` to an array with appropriate element type:

```@repl getting-started
convert(Vector{Int}, tb.mylabl)
```

In the last example, the element type of the output array has become `Int`.
In the last example, the element type of the output array has become `Int`
while the labels are ignored.

!!! note

The array returned by `refarray` (and by `convert` if element type is not converted)
is exactly the same array underlying the `LabeledArray`.
Therefore, modifying the elements of the array
will also modify the values in the original `LabeledArray`.
will also mutate the values in the associated `LabeledArray`.

## More Options

Expand Down
171 changes: 171 additions & 0 deletions docs/src/man/metadata.md
@@ -0,0 +1,171 @@
# Metadata

```@setup meta
using ReadStatTables
tb = readstat("data/sample.dta")
```

File-level metadata associated with a data file are collected in a [`ReadStatMeta`](@ref);
while variable-level metadata associated with each data column
are collected in [`ReadStatColMeta`](@ref)s.
These metadata objects are stored in a [`ReadStatTable`](@ref) along with the data columns
and can be accessed via methods compatible with
[DataAPI.jl](https://github.com/JuliaData/DataAPI.jl).

## File-Level Metadata

Each `ReadStatTable` contains a `ReadStatMeta` for file-level metadata.

```@docs
ReadStatMeta
```

To retrieve the `ReadStatMeta` from the `ReadStatTable`:

```@repl meta
metadata(tb)
```

The value associated with a specific metadata key can be retrieved via:

```@repl meta
metadata(tb, "filelabel")
metadata(tb, "filelabel", style=true)
```

To obtain a complete list of metadata keys:

```@repl meta
metadatakeys(tb)
```

Metadata contained in a `ReadStatMeta` can be modified,
optionally with a metadata style set at the same time:

```@repl meta
metadata!(tb, "filelabel", "A file label", style=:note)
```

Since `ReadStatMeta` has a dictionary-like interface,
one can also directly work with it:

```@repl meta
m = metadata(tb)
keys(m)
m["filelabel"]
m["filelabel"] = "A new file label"
copy(m)
```

## Variable-Level Metadata

A `ReadStatColMeta` is associated with each data column for variable-level metadata.

```@docs
ReadStatColMeta
```

To retrieve the `ReadStatColMeta` for a specified data column contained in a `ReadStatTable`:

```@repl meta
colmetadata(tb, :mylabl)
```

The value associated with a specific metadata key can be retrieved via:

```@repl meta
colmetadata(tb, :mylabl, "label")
colmetadata(tb, :mylabl, "label", style=true)
```

To obtain a complete list of metadata keys:

```@repl meta
colmetadatakeys(tb, :mylabl)
```

Metadata contained in a `ReadStatColMeta` can be modified,
optionally with a metadata style set at the same time:

```@repl meta
colmetadata!(tb, :mylabl, "label", "A variable label", style=:note)
```

A `ReadStatColMeta` also has a dictionary-like interface:

```@repl meta
m = colmetadata(tb, :mylabl)
keys(m)
m["label"]
copy(m)
```

However, it can not be modified directly via `setindex!`:

```@repl meta
m["label"] = "A new label"
```

Instead, since the metadata associated with each key
are stored consecutively in arrays internally,
one may directly access the underlying array for a given metadata key:

```@docs
colmetavalues
```

```@repl meta
v = colmetavalues(tb, "label")
```

Notice that changing any value in the array returned above will
affect the corresponding `ReadStatColMeta`:

```@repl meta
colmetadata(tb, :mychar, "label")
v[1] = "char"
colmetadata(tb, :mychar, "label")
```

## Metadata Styles

Metadata styles provide additional information on
how the metadata should be processed in certain scenarios.
`ReadStatTables.jl` does not require such information.
However, specifying metadata styles can be useful
when the metadata need to be transferred to some other object
(e.g., `DataFrame` from [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl)).
Packages that implement metadata-related methods compatible with
[DataAPI.jl](https://github.com/JuliaData/DataAPI.jl)
are able to recognize the metadata contained in `ReadStatTable`.

By default, all metadata have the `:default` style.
The user-specified metadata styles
are recorded in a `Dict` based on the keys of metadata:

```@repl meta
metastyle(tb)
```

All metadata associated with keys not listed above are of `:default` style.
To modify the metadata style for those associated with a given key:

```@repl meta
metastyle!(tb, "timestamp", :note)
```

The same method is also used for variable-specific metadata.
However, since the styles are only determined by the metadata keys,
metadata associated with the same key always have the same style
and hence are not distinguished across different columns.

```@repl meta
metastyle!(tb, "label", :note)
colmetadata(tb, :mychar, "label", style=true)
colmetadata(tb, :mynum, "label", style=true)
```

```@docs
metastyle
metastyle!
```
34 changes: 1 addition & 33 deletions docs/src/man/table-interface.md
Expand Up @@ -6,7 +6,7 @@ This page provides further details on the interface of `ReadStatTable`.
ReadStatTable
```

## Accessing Data in ReadStatTable
## Data Columns

Commonly used methods are supported for working with `ReadStatTable`.
As a subtype of `Tables.AbstractColumns`,
Expand Down Expand Up @@ -56,35 +56,3 @@ for col in tb
println(eltype(col))
end
```

## Accessing Metadata in ReadStatTable

When calling `readstat`, a `ReadStatMeta` object,
which collects metadata from the data file,
is saved in the `ReadStatTable`.
This object can be retrieved from `ReadStatTable` via `getmeta`.

```@docs
ReadStatMeta
getmeta
```

When shown on REPL, a list of the available metadata are printed:

```@repl table
getmeta(tb)
```

Each field of `ReadStatMeta` can be accessed
either directly from `ReadStatMeta` or from `ReadStatTable`
via the corresponding accessor function.

```@docs
varlabels
varformats
val_label_keys
val_label_dict
filelabel
filetimestamp
fileext
```
2 changes: 1 addition & 1 deletion src/table.jl
Expand Up @@ -289,7 +289,7 @@ function colmetadata!(tb::ReadStatTable, col::ColumnIndex,
key::Union{AbstractString, Symbol}, value; style=nothing)
_colmeta!(tb, col, key, value)
style === nothing || (metastyle!(tb, key, style))
return _colmeta(tb)
return colmetadata(tb)
end

"""
Expand Down

0 comments on commit d47bc11

Please sign in to comment.